Our project is to analyze the hate content targeting Asia groups in Twitter and Reddit. There are two dimensions for our project:
Due to the time limitation, we only able to complish our 1st and 2nd goal. We touch a little bit on our third goal during our analysis, but find out that the scope is too large to finish in the time frame (It’s difficult to measure the culture and policies using our current data and it related to some private issues)
We formulate this project because we see an increasing amount of hate crime targeting Asian groups after the pandemic. The discrimination also gets worse on social platform after Trump post hate content on Twitter. We hope our project can help people become aware of the situation.
!pip install gensim==3.8.3
Requirement already satisfied: gensim==3.8.3 in /root/venv/lib/python3.7/site-packages (3.8.3)
Requirement already satisfied: smart-open>=1.8.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from gensim==3.8.3) (3.0.0)
Requirement already satisfied: six>=1.5.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from gensim==3.8.3) (1.15.0)
Requirement already satisfied: numpy>=1.11.3 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from gensim==3.8.3) (1.19.5)
Requirement already satisfied: scipy>=0.18.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from gensim==3.8.3) (1.6.1)
Requirement already satisfied: requests in /shared-libs/python3.7/py/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim==3.8.3) (2.25.1)
Requirement already satisfied: chardet<5,>=3.0.2 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim==3.8.3) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim==3.8.3) (2020.12.5)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim==3.8.3) (1.26.3)
Requirement already satisfied: idna<3,>=2.5 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim==3.8.3) (2.10)
WARNING: You are using pip version 20.1.1; however, version 21.0.1 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
import gensim, spacy
import gensim.corpora as corpora
from nltk.corpus import stopwords
from spacy.lang.en.stop_words import STOP_WORDS
import pandas as pd
import re
from tqdm import tqdm
import time
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
We used the following keywords to search for the data: "China virus ", "China flu", "Kung flu", "Wuhan virus ", "#fuckchina", "#chinaisasshoe", "#chinaisterrorist ", "#boycottchina", "#blamechina", "#MakeChinaPay", "yellow invader", "rice nigger", "spink", "sideways vaginas/sideways vagina", "chinig", "paki", "Chinese wetback", "Dink”
These words are extracted from two papers (Ziems et al 2020, Vidgen et al 2020) and an online database.
# Keywords 1 - 9 to collect Twitter data
# import libraries
import tweepy
import json
import pandas as pd
import numpy as np
from collections import defaultdict
# Function to read the key file and load keys in a dictionary
def loadKeys(key_file):
with open(key_file) as f:
key_dict = json.load(f)
return key_dict['api_key'], key_dict['api_secret'], key_dict['token'], key_dict['token_secret']
# Authorizing an application to access Twitter account data¶
KEY_FILE = 'temp-imt547.json'
api_key, api_secret, token, token_secret = loadKeys(KEY_FILE)
auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(token, token_secret)
api = tweepy.API(auth)
# Create a dictionary where we can store data
searchPosts_dict = defaultdict(list)
searchUsers_dict = defaultdict(list)
# Search for tweets containing exact phrase "COVID19" or "vaccine" or both
search_term = "(China virus OR China flu OR Kung flu OR Wuhan virus OR #fuckchina OR #boycottchina OR #blamechina OR #chinaisterrorist) lang:en"
# test: for tweet in tweepy.Cursor(api.search_30_day, environment_name = 'development', query = search_term, maxResults=100).items(1000):
#for tweet in tweepy.Cursor(api.search_full_archive, environment_name = 'development', query = search_term, fromDate = 201901010000, toDate = 201907010000, maxResults=100).items(1000):
for tweet in tweepy.Cursor(api.search_full_archive, environment_name = 'development', query = search_term, fromDate = 202001010000, toDate = 202007010000, maxResults=100).items(1000):
# Below codes are for only the tweets
searchPosts_dict["Tweet ID"].append(tweet.id)
# Handle the content
if hasattr(tweet, "retweeted_status"): # Check if Retweet
try:
searchPosts_dict["Content"].append(tweet.retweeted_status.extended_tweet["full_text"])
except AttributeError:
searchPosts_dict["Content"].append(tweet.retweeted_status.text)
else:
try:
searchPosts_dict["Content"].append(tweet.extended_tweet["full_text"])
except AttributeError:
searchPosts_dict["Content"].append(tweet.text)
searchPosts_dict["Date"].append(tweet.created_at)
searchPosts_dict["Retweet Count"].append(tweet.retweet_count)
searchPosts_dict["Posted By"].append(tweet.user)
searchPosts_dict["Source"].append(tweet.source)
searchPosts_dict["In Reply to What Tweet"].append(tweet.in_reply_to_status_id)
searchPosts_dict["In Reply to Who"].append(tweet.in_reply_to_user_id)
searchPosts_dict["Reply Content"].append(tweet.reply_count)
searchPosts_dict["Favorite Count"].append(tweet.favorite_count)
searchPosts_dict["If Retweets"].append(tweet.retweeted)
searchPosts_dict["Filter Level"].append(tweet.filter_level)
# Handle place.country
if hasattr(tweet,'place.country'):
searchPosts_dict["Location_Country"].append(tweet.place.country)
else:
searchPosts_dict["Location_Country"].append(None)
# Handle place.coordinates
if hasattr(tweet,'place.bounding_box.coordinates'):
searchPosts_dict["Location_Coordinates"].append(tweet.place.bounding_box.coordinates)
else:
searchPosts_dict["Location_Coordinates"].append(None)
# Handle possibly sensitive
if hasattr(tweet,'possibly_sensitive'):
searchPosts_dict["Is Sensitive"].append(tweet.possibly_sensitive)
else:
searchPosts_dict["Is Sensitive"].append(None)
# Below codes are for users of each tweets
searchUsers_dict["User ID"].append(tweet.user.id)
searchUsers_dict["Follower Count"].append(tweet.user.followers_count)
searchUsers_dict["Favourites Count"].append(tweet.user.favourites_count)
searchUsers_dict["Friends Count"].append(tweet.user.friends_count)
searchUsers_dict["User Name"].append(tweet.user.name)
searchUsers_dict["Created At"].append(tweet.user.created_at)
searchUsers_dict["Location"].append(tweet.user.location)
searchUsers_dict["If Verified"].append(tweet.user.verified)
searchUsers_dict["Listed Count"].append(tweet.user.listed_count)
searchUsers_dict["Statuses Count"].append(tweet.user.statuses_count)
# Convert collected data to dataframe
searchTweets_posts_data = pd.DataFrame(searchPosts_dict)
searchTweets_users_data = pd.DataFrame(searchUsers_dict)
searchTweets_users_data.rename(columns = {"id": "User ID"}, inplace = True)
# Output the files
searchTweets_posts_data.to_csv('twitter_posts_1-9_after.csv', encoding='utf-8')
searchTweets_users_data.to_csv('twitter_users_1-9_after.csv', encoding='utf-8')
# Combine the tweets data
before_tweets2 = pd.read_csv('twitter_posts_1-9_before.csv')
after_tweets2 = pd.read_csv('twitter_posts_1-9_after.csv')
before_tweets1 = pd.read_csv('before_tweets2.csv')
after_tweets1 = pd.read_csv('after_tweets2.csv')
before_tweets2 = before_tweets2[['Tweet ID', 'Content', 'Date', 'Retweet Count', 'Posted By', 'Source', 'In Reply to What Tweet', 'In Reply to Who', 'Location_Country', 'Location_Coordinates', 'Reply Content', 'Favorite Count', 'If Retweets', 'Is Sensitive', 'Filter Level']]
after_tweets2 = after_tweets2[['Tweet ID', 'Content', 'Date', 'Retweet Count', 'Posted By', 'Source', 'In Reply to What Tweet', 'In Reply to Who', 'Location_Country', 'Location_Coordinates', 'Reply Content', 'Favorite Count', 'If Retweets', 'Is Sensitive', 'Filter Level']]
def combinedfs(before_tweets1,before_tweets2):
#before_tweets2.drop(columns= ["Unnamed: 0"],inplace=True)
names= before_tweets1.columns.to_list()
df = pd.DataFrame( np.concatenate( (before_tweets1.values, before_tweets2.values), axis=0 ) )
df.columns =names
return df
df = combinedfs(before_tweets1,before_tweets2)
df.to_csv("before_tweets.csv",index=False)
df2 = combinedfs(after_tweets1,after_tweets2)
df2.to_csv("after_tweets.csv",index=False)
# Combine the users data
before_user2 = pd.read_csv(r"..\Twitter-1-9\data\twitter_users_1-9_before.csv")
after_user2 = pd.read_csv(r"..\Twitter-1-9\data\twitter_users_1-9_after.csv")
before_user1 = pd.read_csv(r"..\Twitter-latter9\data\before_user2.csv")
after_user1 = pd.read_csv(r"..\Twitter-latter9\data\after_user2.csv")
def combinedfs(before_user1,before_user2):
before_user2.drop(columns= ["Unnamed: 0"],inplace=True)
names= before_user1.columns.to_list()
df = pd.DataFrame( np.concatenate( (before_user1.values, before_user2.values), axis=0 ) )
df.columns =names
return df
# check column names
col1 = before_user1.columns.to_list()
col2 = before_user2.columns.to_list()
pd.DataFrame(data = [col1,col2])
df1 = combinedfs(before_user1,before_user2)
df.to_csv("before_user.csv",index=False)
df2 = combinedfs(after_user1,after_user2)
df2.to_csv("after_user.csv",index=False)
# add a boolean feature to seperate two datasets
before = [0]*len(user_before)
user_before['after'] = before
after = [1]*len(user_after)
user_after['after'] = after
# merge the two dataset
def combinedfs(before_user1,before_user2):
import numpy as np
names= before_user1.columns.to_list()
df = pd.DataFrame( np.concatenate((before_user1.values, before_user2.values), axis=0 ) )
df.columns =names
return df
user = combinedfs(user_before,user_after)
#change user_id as string
user.astype({'user_id': 'object'})
# export to csv file
user.to_csv(r"..\Twitter-combined\user_twitter.csv",index=False)
# Keywords 10 - 18 to collect Twitter data
# search for tweets
features = ["id","text","created_at","retweet_count","user","user_id","truncated","extended_tweet","source","in_reply_to_status_id",
"in_reply_to_user_id","place.country","place.coordinates","reply_count","favorite_count",
"retweeted","possibly_sensitive","filter_level"]
user_features =["id","followers_count","favourites_count","friends_count","name",
"created_at","location","verified","listed_count","statuses_count"]
search_terms_list_premium = "(yellow invader OR rice nigger OR sideways asian vagina OR chinig OR Chinese wetback OR dink OR paki OR #MakeChinaPay) lang:en"
## tweets before the covid
before_tweets = tweepy.Cursor(api.search_full_archive,
environment_name= 'speech',
query= search_terms_list_premium,
fromDate = 201901010000,
toDate = 201907010000,
maxResults=100).items(1000)
## tweets after the covid
after_tweets = tweepy.Cursor(api.search_full_archive,
environment_name= 'speech',
query= search_terms_list_premium,
fromDate = 202001010000,
toDate = 202007010000,
maxResults=100).items(1000)
# tidy the dataframe
## before the covid
# put data into the dataframe
before_tweets_df = pd.DataFrame(columns=features)
before_user_df = pd.DataFrame(columns = user_features)
for n,t in enumerate(before_tweets):
for col in user_features:
try:
before_user_df.at[n, col] = t.user.__getattribute__(col)
except AttributeError:
before_user_df.at[n, col] = None
for col in features:
try:
before_tweets_df.at[n, col] = t.__getattribute__(col)
except AttributeError:
before_tweets_df.at[n, col] = None
#tidy
for i,j in before_tweets_df['truncated'].items():
# replace user column with user id to make the dataframe smaller
before_tweets_df.loc[(i),'user_id'] = before_tweets_df.loc[(i),'user'].id
if j == True:
# replace truncated text in the 'text' column with full text
before_tweets_df.loc[(i),'text'] = before_tweets_df['extended_tweet'].iloc[i]['full_text']
# drop columns that will not use
before_tweets_df.drop(columns = ['truncated','extended_tweet','user'], inplace = True)
# rename the column to match the tweet dataframe
before_user_df.rename(columns = {"id":"user_id"},inplace = True)
## after the covid
def load_data(before_tweets):
# put data into the dataframe
before_tweets_df = pd.DataFrame(columns=features)
before_user_df = pd.DataFrame(columns = user_features)
for n,t in enumerate(before_tweets):
for col in user_features:
try:
before_user_df.at[n, col] = t.user.__getattribute__(col)
except AttributeError:
before_user_df.at[n, col] = None
for col in features:
try:
before_tweets_df.at[n, col] = t.__getattribute__(col)
except AttributeError:
before_tweets_df.at[n, col] = None
#tidy
for i,j in before_tweets_df['truncated'].items():
# replace user column with user id to make the dataframe smaller
before_tweets_df.loc[(i),'user_id'] = before_tweets_df.loc[(i),'user'].id
if j == True:
# replace truncated text in the 'text' column with full text
before_tweets_df.loc[(i),'text'] = before_tweets_df['extended_tweet'].iloc[i]['full_text']
# drop columns that will not use
before_tweets_df.drop(columns = ['truncated','extended_tweet','user'], inplace = True)
# rename the column to match the tweet dataframe
before_user_df.rename(columns = {"id":"user_id"},inplace = True)
return before_tweets_df,before_user_df
after_tweets_df, after_user_df = load_data(after_tweets)
# check the dataframes and output to csv files
def check_shape(before_tweets_df, before_user_df):
print(before_tweets_df.shape)
print(before_tweets_df.head())
print(before_user_df.head())
check_shape(before_tweets_df,before_user_df)
# check_shape(after_tweets_df,after_user_df)
# save to csv files
before_tweets_df.to_csv(r"..\Twitter-latter9\data\before_tweets2.csv",index=False)
before_user_df.to_csv(r"..\Twitter-latter9\data\before_user2.csv",index=False)
after_tweets_df.to_csv(r"..\Twitter-latter9\data\after_tweets2.csv",index=False)
after_user_df.to_csv(r"..\Twitter-latter9\data\after_user2.csv",index=False)
# combine all the twitter data
tweets_before = pd.read_csv('before_tweets.csv')
tweets_after = pd.read_csv('after_tweets.csv')
before = [0]*len(tweets_before)
tweets_before['after'] = before
after = [1]*len(tweets_after)
tweets_after['after'] = after
df = pd.concat([tweets_before,tweets_after],axis = 0)
# drop all null columns
df.drop(columns = ['place.country','place.coordinates'],inplace=True)
df.to_csv('tweets.csv',index=False,encoding='utf-8')
# import libraries
import praw
import pandas as pd
from psaw import PushshiftAPI
from datetime import datetime
from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pyplot as plt
# set the environment
reddit = praw.Reddit('DEFAULT')
api = PushshiftAPI()
start = int(datetime(2019,1,1).timestamp()) # from 2019/01/01
end = int(datetime(2021,2,17).timestamp()) # to 2021/2/17
# Define the first nine keywords of hate content
hatewords1_9 = 'China+virus|China+flu|Kung+flu|Wuhan+virus|fuck+China|china+asshoe|china+terrorist|boycott+china|blame+china'
# Determine the attributes of interests
filter_keys = ['id','selftext','title','name', 'subreddit','author', 'clicked'
'num_comments', 'created_at', 'distinguished', 'is_original_content',
'is_self', 'locked', 'over_18', 'stickied',
'score','upvote_ratio']
search = api.search_submissions(after=start,
before=end,
q=hatewords1_9,
filter=filter_keys,
size=3000)
# The maximum number of records for each keywords
max_records = 500
# This function reads the result from generator and put them into the dataframe
def add_subs_withMax(search, limit):
cache = []
for c in search:
cache.append(c.d_)
if len(cache) >= limit:
return pd.DataFrame(cache)
return pd.DataFrame(cache)
# This function search reddit by submissions and returns a dataframe
def search_reddit_submission(keywords, start, filter_keys, limit):
api = PushshiftAPI()
dfs = []
# entry_count = {}
# Loop over the keywords list for searching
for word in keywords:
print("Start searching for word [{}] >>>".format(word))
search = api.search_submissions(q = word, after=start, filter=filter_keys, sort='asc', limit=None)
# Read submission from the search result, add them to the final list
print("> Add result to list...")
df = add_subs_withMax(search, limit)
dfs.append(df)
# Record the length (for debugging)
# entry_count[word] = df
print("> Action done, {} entries of records are added".format(len(df)))
return dfs
# This function search reddit by comments and returns a dataframe
def search_reddit_comments(keywords, start, filter_keys, limit):
api = PushshiftAPI()
dfs = []
# entry_count = {}
# Loop over the keywords list for searching
for word in keywords:
print("Start searching for word [{}] >>>".format(word))
search = api.search_comments(q = word, after=start, filter=filter_keys, sort='asc', limit=None)
# Read submission from the search result, add them to the final list
print("> Add result to list...")
df = add_subs_withMax(search, limit)
dfs.append(df)
# Record the length (for debugging)
# entry_count[word] = df
print("> Action done, {} entries of records are added".format(len(df)))
return dfs
# Search for all submissions
subs_df = search_reddit_submission(keywords, start, filter_keys, max_records)
# Search for all comments
cmt_df = search_reddit_comments(keywords, start, filter_keys, max_records)
print("All searches are done")
# Combine the result into dataframe
all_df_subs = pd.concat(subs_df, ignore_index=True)
all_df_cmt = pd.concat(cmt_df, ignore_index=True)
print("Find {} submissions in total".format(len(all_df_subs)))
# Store the result into csv
all_df_subs.to_csv('./reddit_submission10-18.csv', encoding='utf-8')
all_df_cmt.to_csv('./reddit_comments10-18.csv', encoding='utf-8')
# Set up environment
import praw
import pandas as pd
from time import sleep
from datetime import datetime
reddit = praw.Reddit('DEFAULT')
from psaw import PushshiftAPI
#api = PushshiftAPI() # api call
api = PushshiftAPI()
# Fetch submission
start = int(datetime(2019,1,1).timestamp())
end = int(datetime(2021,2,17).timestamp())
hatewords1_9 = 'China+virus|China+flu|Kung+flu|Wuhan+virus|fuck+China|china+asshoe|china+terrorist|boycott+china|blame+china'
# those below are attributes of 'submission' object
filter_keys = ['id','selftext','title','name', 'subreddit','author', 'clicked'
'num_comments', 'created_at', 'distinguished', 'is_original_content',
'is_self', 'locked', 'over_18', 'stickied',
'score','upvote_ratio']
search = api.search_submissions(after=start,
before=end,
q=hatewords1_9,
filter=filter_keys,
size=3000)
hate1_submissions = pd.DataFrame([submission.d_ for submission in search])
hate1_submissions.shape # return 17788 rows
# Extract sample data: 2019 1.1 -7.1
hate = hate1_submissions
h2019 = hate[(hate.dateTime.dt.month <=7) & (hate.dateTime.dt.year==2019)].dropna(subset=['selftext', 'title'])
# Extract sample data: 2020 1.1 -7.1
h2020 = hate[(hate.dateTime.dt.month <=7) & (hate.dateTime.dt.year==2020)].dropna(subset=['selftext', 'title'])
# Combine two phases
sample2019first9keyWords = h2019.sample(n=1000)
sample2020first9keyWords = h2020.sample(n=1000)
subSampleFirst9keyWords = pd.concat([sample2019first9keyWords, sample2020first9keyWords])
# Combine all text content
subSampleFirst9keyWords['allText'] = subSampleFirst9keyWords.selftext + ' ' + subSampleFirst9keyWords.title
# Extract attributes of interests
subs_sample1_9 = subSampleFirst9keyWords[['id','author','dateTime','allText','score', 'subreddit']]
reddit = pd.read_csv("Reddit/reddit_submission_sampled_phase2.csv",encoding='utf-8')
tweets = pd.read_csv("Twitter-combined/tweets.csv",encoding='utf-8')
# drop the extra index column
reddit.reset_index(level=0, inplace=True)
# add a categorical variable to differentiate data from reddit and twitter
reddit['platform']=["reddit"]*len(reddit)
tweets['platform']=['twitter']*len(tweets)
# create a before boolean varaible for reddit data
reddit['after']=reddit['dateTime']>='2020'
# create dataframes for each
text_r = pd.DataFrame({'text_id':reddit.id,'text':reddit.allText,'dateTime':reddit.dateTime, 'after':reddit.after, 'platform':reddit.platform})
text_t = pd.DataFrame({'text_id':tweets.id,'text':tweets.text,'dateTime':tweets.created_at, 'after':tweets.after, 'platform':tweets.platform})
## exclude time in dateTime
# reddit['dateTime']=reddit['dateTime'].str.split(" ", n = 1, expand = True)
# tweets['created_at'].str.split(" ", n = 1, expand = True,inplace = True)
# combine two datasets
text_data = pd.concat([text_r,text_t], ignore_index=True)
text_data.sample(10)
| text_id | text | dateTime | after | platform | |
|---|---|---|---|---|---|
| 6397 | 1278077976165392384 | RT @Flexiblexxx: Retweet for.👇 Like for👇... | 2020-06-30 21:28:29 | 1 | |
| 5530 | 1143950794502418433 | “Tariffs are not causing the summer flu, but y... | 2019-06-26 18:34:57 | 0 | |
| 2157 | auj9tn | [deleted]An ugly obese girl in my group of fri... | 2019-02-25 09:13:47 | 0 | |
| 3420 | epku6u | Which Paki MILF is the most fapworthy? | 2020-01-16 15:31:27 | 1 | |
| 4038 | 1145464499845128192 | @piersmorgan You’re such a dink. | 2019-06-30 22:49:53 | 0 | |
| 969 | bg7pdu | This is the weekly UN voting thread, here all ... | 2019-04-22 21:32:11+00:00 | 0 | |
| 358 | bq8toy | &#x200B;\n\nThe following post was made to... | 2019-05-18 21:05:20+00:00 | 0 | |
| 403 | azx3rs | Be real, guys. If he actually got into a sword... | 2019-03-11 18:48:39+00:00 | 0 | |
| 2055 | bs62h1 | I originally made this for my clan, but enjoy... | 2019-05-23 17:54:07 | 0 | |
| 330 | cfh4tc | The author of the following text claims to hav... | 2019-07-20 03:18:26+00:00 | 0 |
# remove the punctuations and extract words
def textcleaner(row):
import re
row = row.lower()
#remove urls
row = re.sub(r'http\S+', '', row)
#remove mentions
row = re.sub(r"(?<![@\w])@(\w{1,25})", '', row)
#remove hashtags
row = re.sub(r"(?<![#\w])#(\w{1,25})", '',row)
#remove other special characters
row = re.sub('[^A-Za-z .-]+', '', row)
#remove digits
row = re.sub('\d+', '', row)
# --------- after 1 round -------------
#remove single hyphens and hyphen in a word
row = re.sub(r"\s-+\s", ' ',row)
row = re.sub(r"-", ' ',row)
#remove redundant whitespace
row = re.sub(r"(\s+)", ' ',row)
# remove period marks
row = re.sub(r"\.", '',row)
row = row.strip(" ")
return row
text_data['cleaned_text']=text_data['text'].apply(textcleaner)
# text_data.to_csv('text_data.csv',encoding = 'utf-8', index = False)
data = text_data
# data = pd.read_csv('text_data.csv',encoding = 'utf-8')
nlp = spacy.load("en_core_web_sm", disable=["parser","ner",'entity_ruler','entity_linker','sentencizer','textcat'])
# lemmatize and remove stop words - first round
def spacy_processing(text):
doc = nlp(text)
doc_lemma = []
for token in doc:
if token.is_stop == False:
doc_lemma.append(token.lemma_)
return doc_lemma
cleantext = data.cleaned_text.dropna()
tokens_clean = cleantext.apply(spacy_processing)
# --the topics returned by LDA first run have words with no meanings, eg.'rt','nt','x' so here conduct addtional cleanings---
# remove extra stops after checking - second round
stop_words = stopwords.words('english') # load NLTK stopwords
stop_words.extend(['nt', 'rt', 'amp','vs'])# add some extra words
tokens_clean_2nd = tokens_clean.apply(lambda x: [word for word in x if word not in stop_words])
# remove single character eg'x' that make no sense - third round
tokens_clean_3rd = tokens_clean_2nd.apply(lambda x: [word for word in x if len(word) >1])
# append the tokenized text
data['tokenized_text']=tokens_clean_3rd
data.drop(3448,inplace = True)
data.reset_index(drop=True,inplace = True)
data.to_csv('text_data.csv',encoding = 'utf-8', index = False)
The first question we want to analyze is the word frequency from all posts. We hope it can give us some general sense of how the problem look like. The analysis will be conduct on both platforms and in a general sense. Specifically, we have the following sub-questions that we want to answer through our analysis.
CountVector to count the word frequency of each posts, generating a word frequncy documentation. Then, we sum up the words to find out the total appearance of each word. This approach can give us an intinctive result of what words used more frequently in these posts.TF-IDF to count the weighted word frequency. In addition to the frequency, TF-IDF will evaluate the importance of each word in a context and give the frequency score based on the weighted results.# Import libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import warnings
import squarify
import matplotlib
from PIL import Image
warnings.filterwarnings('ignore')
# Set the stop words
stop_words = set(stopwords.words('english'))
# Read from the preprocessed data
hdf = pd.read_csv('./text_data_extra_cleaned.csv')
hdf.drop(columns=['Unnamed: 0'], inplace=True)
hdf['dateTime'] = pd.to_datetime(hdf['dateTime'])
hdf['cleaned_text'] = hdf['cleaned_text'].astype(str)
# Create sub dataframe for further comparison analysis
rd = hdf[hdf['platform'] == 'reddit']
tw = hdf[hdf['platform'] == 'twitter']
bf = hdf[hdf['before'] == 1]
af = hdf[hdf['before'] == 0]
This part generates the table that contains the TF-IDF scores
Help function credits: https://kavita-ganesan.com/extracting-keywords-from-text-tfidf/#.YDmgHxPAS3I
# Generater dataframe using TF-IDF vector
def tfidf_score(df):
tfidf = TfidfVectorizer(stop_words=stop_words, tokenizer=word_tokenize)
response = tfidf.fit_transform(df['cleaned_text'])
feature_names = tfidf.get_feature_names()
sorted_items=sort_coo(response.tocoo())
keywords=extract_topn_from_vector(feature_names,sorted_items,500)
# Put into dataframes
kw_df = pd.DataFrame({'keywords':keywords.keys(), 'tfdif':keywords.values()})
kw_df.set_index('keywords', inplace=True)
return kw_df
# Helper functions to get the topn keywords
def sort_coo(coo_matrix):
tuples = zip(coo_matrix.col, coo_matrix.data)
return sorted(tuples, key=lambda x: (x[1], x[0]), reverse=True)
def extract_topn_from_vector(feature_names, sorted_items, topn):
"""get the feature names and tf-idf score of top n items"""
#use only topn items from vector
sorted_items = sorted_items[:topn]
score_vals = []
feature_vals = []
# word index and corresponding tf-idf score
for idx, score in sorted_items:
#keep track of feature name and its corresponding score
score_vals.append(round(score, 3))
feature_vals.append(feature_names[idx])
#create a tuples of feature,score
#results = zip(feature_vals,score_vals)
results= {}
for idx in range(len(feature_vals)):
results[feature_vals[idx]]=score_vals[idx]
return results
# Generate TF-IDF dataframe for all sub datasets
tf_df = tfidf_score(hdf)
tf_tw = tfidf_score(tw)
tf_rd = tfidf_score(rd)
tf_bf = tfidf_score(bf)
tf_af = tfidf_score(af)
This part generates the table that contains the count frequncy
# Generater dataframe using CountVectorizer
def cv_score(df):
cv = CountVectorizer(stop_words=stop_words, tokenizer=word_tokenize)
cvfit = cv.fit_transform(df['cleaned_text'])
cv_df = pd.DataFrame(cvfit.toarray(), index=df.text_id, columns=cv.get_feature_names())
total = cv_df.sum(axis=0).to_dict()
cv_df = pd.DataFrame({'keywords':total.keys(), 'cv_counts':total.values()})
cv_df.set_index('keywords', inplace=True)
return cv_df
# Generate CountVectorizer dataframe for all sub datasets
cv_df = cv_score(hdf)
cv_tw = cv_score(tw)
cv_rd = cv_score(rd)
cv_bf = cv_score(bf)
cv_af = cv_score(af)
We will use all the data and compare the result from two vectors
# Combine the dataframe
fr = tf_df.merge(cv_df, how='left', on='keywords')
all_df = [tf_df, tf_tw, tf_rd, tf_bf, tf_af, cv_df, cv_tw, cv_rd, cv_bf, cv_af]
# Save the results
# fr.to_csv('./word_frequency.csv')
# tf_df.to_csv('./frequency/word_frequency_tf_df.csv')
# tf_tw.to_csv('./frequency/word_frequency_tf_tw.csv')
# tf_rd.to_csv('./frequency/word_frequency_tf_rd.csv')
# tf_bf.to_csv('./frequency/word_frequency_tf_bf.csv')
# tf_af.to_csv('./frequency/word_frequency_tf_af.csv')
# cv_df.to_csv('./frequency/word_frequency_cv_df.csv')
# cv_tw.to_csv('./frequency/word_frequency_cv_tw.csv')
# cv_rd.to_csv('./frequency/word_frequency_cv_rd.csv')
# cv_bf.to_csv('./frequency/word_frequency_cv_bf.csv')
# cv_af.to_csv('./frequency/word_frequency_cv_af.csv')
# define function to generate a treemap graph for all the keywords
def treemap(df, title, l=15, w=15):
plt.figure(figsize=(l, w))
norm = matplotlib.colors.Normalize(vmin=min(df['cv_counts']), vmax=max(df['cv_counts']))
colors = [matplotlib.cm.Blues(norm(value)) for value in df['cv_counts']]
squarify.plot(sizes=df['tfdif'], label=df.index, alpha=.8, color=colors)
plt.title(title, fontweight="bold")
plt.axis('off')
plt.show()
# Generate graph for all keywords
treemap(fr, "Keywords Frequency")
We can see from the heatmap that the frequent words in general are neural and hard to detect any specific topics.
# Select top30 keywords from each of the vectors
kw_sort = fr.sort_values(by=['tfdif'], ascending=False)[:30]
cv_sort = fr.sort_values(by=['cv_counts'], ascending=False)[:30]
# Close look at the top word frequency for counter vector
norm = matplotlib.colors.Normalize(vmin=min(cv_sort['tfdif']), vmax=max(cv_sort['tfdif']))
colors = [matplotlib.cm.Blues(norm(value)) for value in cv_sort['tfdif']]
squarify.plot(sizes=cv_sort['cv_counts'], label=cv_sort.index, alpha=.8, color = colors)
plt.title("CounterVector TOP30 Keywords Frequency", fontweight="bold")
plt.axis('off')
plt.show()
norm = matplotlib.colors.Normalize(vmin=min(kw_sort['cv_counts']), vmax=max(kw_sort['cv_counts']))
colors = [matplotlib.cm.Blues(norm(value)) for value in kw_sort['cv_counts']]
squarify.plot(sizes=kw_sort['tfdif'], label=kw_sort.index, alpha=.8, color=colors)
plt.title("TF-IDF TOP30 Keywords Frequency", fontweight="bold")
plt.axis('off')
plt.show()
We can see that count vector and TF-IDF give different frequent list. Some words are same such as good, like. TF-IDF finds more negative words such as damn, disgusting, frustrating, makechinapay which are absent from the results in countvector.
# Find hate words that are in the index of the dataframe
hate_words = ['china', 'virus', 'Wuhan','fuckchina','chinaisasshoe','chinaisterrorist ','boycottchina','blamechina','makechinapay','rice nigger','spink', 'chinig','paki', 'dink']
l1 = []
l2 = []
for h in hate_words:
if h in cv_df.index:
l1.append(h)
if h in fr.index:
l2.append(h)
# Hate words that appear in countvector
cv_df.loc[l1, :]
# Hate words that appear in both results
fr.loc[l2, :]
| tfdif | cv_counts | |
|---|---|---|
| keywords | ||
| virus | 0.829 | 1897 |
| makechinapay | 1.000 | 1 |
| paki | 1.000 | 1744 |
| dink | 1.000 | 1219 |
We mapped our predefined hate words list to our frequency table and found that only a small amount of these words appear in the top frequent words list. Among these. ‘Virus’, ‘paki’, ‘dink’ are most frequent words based on both the count vector and the TF-IDF vector. Hate words such as “makechinapay” don't have high frequency in count vectors but achieve a high TF-IDF score.
tw_score = tf_tw.merge(cv_tw, how='left', on='keywords')
rd_score = tf_rd.merge(cv_rd, how='left', on='keywords')
def makeImage(text1, text2, pic1, pic2, title1, title2):
# mask1 = np.array(Image.open(pic1))
# mask2 = np.array(Image.open(pic2))
wc1 = WordCloud(background_color="white", max_words=1000)
wc2 = WordCloud(background_color="white", max_words=1000)
# generate word cloud
wc1.generate_from_frequencies(text1)
wc2.generate_from_frequencies(text2)
# show
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,15))
ax1.imshow(wc1, interpolation="bilinear")
ax2.imshow(wc2, interpolation="bilinear")
# plt.title(title)
ax1.set_title(title1)
ax1.axis("off")
ax2.set_title(title2)
ax2.axis("off")
plt.show()
makeImage(tw_score['cv_counts'].to_dict(), rd_score['cv_counts'].to_dict(), "twitter.png", "reddit.png", "Twitter words frequency", "Reddit words frequency")
By conducting the frequency analysis on text from the two platforms respectively, we find that the word frequency shows distinctive patterns on platform-wise. Posts on Twitter use more neutral to positive words, and no particular entity is mentioned significantly more than others except virus. On the other hand, posts on Reddit talk about more negative things (or “dirty words”) than the posts on Twitter. Many entities are mentioned among the posts such as “Trump”.
makeImage(tw_score['tfdif'].to_dict(), rd_score['tfdif'].to_dict(), "twitter.png", "reddit.png", "Twitter words frequency", "Reddit words frequency")
The word cloud has different words if we use the frequency results from TF-IDF. However, the general conclusion remains similar. The words on Twitter seem more neutral and appear less entities, while the words from Reddit contain many negative/dirty words.
The result may be explained by two assumptions: first, people share less emotional text on Twitter than on Reddit. Second, Twitter has a more rigorous policy banning negative posts.
bf_score = tf_bf.merge(cv_bf, how='left', on='keywords')
af_score = tf_af.merge(cv_af, how='left', on='keywords')
makeImage(bf_score['cv_counts'].to_dict(), af_score['cv_counts'].to_dict(), "asia.png", "mask.png", "Before COVID", "After COVID")
The word frequency also has different patterns before and after the pandemic. Before pandemic, people use more positive words. “Good”, “well” are more frequently used than others. After a pandemic, the frequent words list changes a lot. “Good” and “well” are no longer used that frequently and people start talking about random things. Without doubt, the pandemic change a lot for people’s daily life.
makeImage(bf_score['tfdif'].to_dict(), af_score['tfdif'].to_dict(), "asia.png", "mask.png", "Before COVID", "After COVID")
# Get data for twitter
mask1 = hdf['before'] == 1 & (hdf['platform'] == 'twitter')
mask2 = hdf['before'] == 0 & (hdf['platform'] == 'twitter')
tw_bf = hdf[mask1]
tw_af = hdf[mask2]
tw_bf_tf = tfidf_score(tw_bf)
tw_af_tf = tfidf_score(tw_af)
tw_bf_cv = cv_score(tw_bf)
tw_af_cv = cv_score(tw_af)
twbf = tw_bf_tf.merge(tw_bf_cv, how='left', on='keywords')
twaf = tw_af_tf.merge(tw_af_cv, how='left', on='keywords')
makeImage(twbf['cv_counts'].to_dict(), twaf['cv_counts'].to_dict(), "twitter.png", "twitter.png", "Before COVID", "After COVID")
makeImage(twbf['tfdif'].to_dict(), twaf['tfdif'].to_dict(), "twitter.png", "twitter.png", "Before COVID", "After COVID")
Before the pandemic, posts on Twitter are more policy-related. People talk more about “us”, “trump”. We can see the pattern changes differently after the pandemic. “virus” becomes a common words in all the posts and more insulting languages are used.
mask1 = hdf['before'] == 1 & (hdf['platform'] == 'reddit')
mask2 = hdf['before'] == 0 & (hdf['platform'] == 'reddit')
rd_bf = hdf[mask1]
rd_af = hdf[mask2]
rd_bf_tf = tfidf_score(rd_bf)
rd_af_tf = tfidf_score(rd_af)
rd_bf_cv = cv_score(rd_bf)
rd_af_cv = cv_score(rd_af)
rdbf = rd_bf_tf.merge(rd_bf_cv, how='left', on='keywords')
rdaf = rd_af_tf.merge(rd_af_cv, how='left', on='keywords')
makeImage(rdbf['cv_counts'].to_dict(), rdaf['cv_counts'].to_dict(), "reddit.png", "reddit.png", "Before COVID", "After COVID")
makeImage(rdbf['tfdif'].to_dict(), rdaf['tfdif'].to_dict(), "reddit.png", "reddit.png", "Before COVID", "After COVID")
It’s interesting to see that the word frequency looks almost the same on Reddit. It may due to the different regulation between the two platfroms. Moreover, we can see that the topics discussed in these two platform also vary a lot. Reddit has less policy related content compared to Twitter, which may also explain why it doesn't change too much before and after COVID.
import pandas as pd
import numpy as np
# Import the data
text_df = pd.read_csv('text_data.csv')
# Further clean to prevent the words that have numbers
import re, string
def clean_text_vadar(row):
row = row.lower()
#remove urls
row = re.sub(r'http\S+', '', row)
#remove mentions
row = re.sub(r"(?<![@\w])@(\w{1,25})", '', row)
#remove hashtags
row = re.sub(r"(?<![#\w])#(\w{1,25})", '',row)
#remove other special characters
row = re.sub('[^A-Za-z .-]+', '', row)
# --------- after 1 round -------------
#remove single hyphens and hyphen in a word
row = re.sub(r"\s-+\s", ' ',row)
row = re.sub(r"-", ' ',row)
#remove redundant whitespace
row = re.sub(r"(\s+)", ' ',row)
# remove period marks
row = re.sub(r"\.", '',row)
row = row.strip(" ")
# --------- after 2 round -------------
#remove data in brackets
row = re.sub('\[.*?\]', '', row)
#remove words containing numbers
row = re.sub('\w*\d\w*', '', row)
return row
further = lambda x: clean_text_vadar(x)
# Let's take a look at the updated text
text_df_clean = pd.DataFrame(text_df.text.apply(further))
# Sentiment analysis with VADER
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
scores = []
for tweet in text_df_clean['text']:
scores.append(analyzer.polarity_scores(tweet))
negative = []
netural = []
positive = []
compound = []
for result in scores:
negative.append(result.get("neg"))
netural.append(result.get("neu"))
positive.append(result.get("pos"))
compound.append(result.get("compound"))
# Result
vaderSentiment = {'original_tweet': text_df_clean['text'],
'sentiment_scores': scores}
sentimentVa = pd.DataFrame.from_dict(vaderSentiment)
sentimentVa
| original_tweet | sentiment_scores | |
|---|---|---|
| 0 | i am not sure the canadian government has the ... | {'neg': 0.17, 'neu': 0.78, 'pos': 0.05, 'compo... |
| 1 | so im a college student i major in the arts an... | {'neg': 0.096, 'neu': 0.744, 'pos': 0.16, 'com... |
| 2 | many chinese men want to marry with white wome... | {'neg': 0.094, 'neu': 0.793, 'pos': 0.112, 'co... |
| 3 | thanks to conservative treehouse and uhavocbg ... | {'neg': 0.112, 'neu': 0.811, 'pos': 0.077, 'co... |
| 4 | procter amp gambles chief brand officer marc p... | {'neg': 0.038, 'neu': 0.877, 'pos': 0.085, 'co... |
| ... | ... | ... |
| 7995 | too late youve tweeted plenty about china how ... | {'neg': 0.0, 'neu': 0.897, 'pos': 0.103, 'comp... |
| 7996 | your right is was later but the virus was not ... | {'neg': 0.163, 'neu': 0.76, 'pos': 0.078, 'com... |
| 7997 | a new strain of the hn swine flu virus is spre... | {'neg': 0.15, 'neu': 0.85, 'pos': 0.0, 'compou... |
| 7998 | china this isnt the first time for their pande... | {'neg': 0.174, 'neu': 0.826, 'pos': 0.0, 'comp... |
| 7999 | you wonder if one should even believe this man... | {'neg': 0.077, 'neu': 0.828, 'pos': 0.095, 'co... |
8000 rows × 2 columns
vaderSentimentSplit = {'negative_score': negative,
'netural_score': netural,
'positive_score': positive,
'compound_score': compound}
scoresDetails = pd.DataFrame.from_dict(vaderSentimentSplit)
scoresDetails
| negative_score | netural_score | positive_score | compound_score | |
|---|---|---|---|---|
| 0 | 0.170 | 0.780 | 0.050 | -0.9959 |
| 1 | 0.096 | 0.744 | 0.160 | 0.9950 |
| 2 | 0.094 | 0.793 | 0.112 | 0.2820 |
| 3 | 0.112 | 0.811 | 0.077 | -0.3612 |
| 4 | 0.038 | 0.877 | 0.085 | 0.9897 |
| ... | ... | ... | ... | ... |
| 7995 | 0.000 | 0.897 | 0.103 | 0.4767 |
| 7996 | 0.163 | 0.760 | 0.078 | -0.6431 |
| 7997 | 0.150 | 0.850 | 0.000 | -0.6124 |
| 7998 | 0.174 | 0.826 | 0.000 | -0.7096 |
| 7999 | 0.077 | 0.828 | 0.095 | -0.0258 |
8000 rows × 4 columns
# Now concat the two dataframes
text_df = pd.concat([text_df, sentimentVa['sentiment_scores'], scoresDetails], axis=1)
text_df['after covid']=text_df['before'].replace({0:1,1:0})
# Visualizations for comparison about the how the averages of the scores change
groupByTime = text_df.groupby('after covid')
negaComparison = groupByTime.negative_score.mean()
negaComparison.plot.bar(color = ['green', 'blue'])
<AxesSubplot:xlabel='after covid'>
The average negativity scores of the posts after covid is higher than the score before covid, which indicate that the posts might possess a tendency of becoming more negative.
neturComparison = groupByTime.netural_score.mean()
neturComparison.plot.bar(color = ['green', 'blue'])
<AxesSubplot:xlabel='after covid'>
The average neutral scores of the posts after covid is pretty much the same as the score before covid.
posiComparison = groupByTime.positive_score.mean()
posiComparison.plot.bar(color = ['green', 'blue'])
<AxesSubplot:xlabel='after covid'>
The average positivity scores of the posts after covid is higher than the score before covid, which indicate the posts might possess a tendency of becoming less positive.
compComparison = groupByTime.compound_score.mean()
compComparison.plot.bar(color = ['green', 'blue'])
<AxesSubplot:xlabel='after covid'>
The average compound scores of the posts after covid is much more negative comparing to the score before covid, which again, indicates that the posts on the two platforms might become more negative after covid.
# Distribution
# About how the distribution of the scores change:
plot = text_df.negative_score[text_df.before==1].plot.hist(
alpha=0.3, label='Before Covid', legend=True,
histtype='stepfilled', edgecolor='black')
text_df.negative_score[text_df.before==0].plot.hist(
alpha=0.3, label='After Covid', legend=True,
histtype='stepfilled', edgecolor='black',
ax=plot)
<AxesSubplot:ylabel='Frequency'>
The number of the posts with higher negativity scores, which are posts that are more obviously negative, increased after covid.
plot = text_df.netural_score[text_df.before==1].plot.hist(
alpha=0.3, label='Before Covid', legend=True,
histtype='stepfilled', edgecolor='black')
text_df.netural_score[text_df.before==0].plot.hist(
alpha=0.3, label='After Covid', legend=True,
histtype='stepfilled', edgecolor='black',
ax=plot)
<AxesSubplot:ylabel='Frequency'>
The number of the posts with higher neutral scores, which are posts that are more obviously neutral, was higher before covid.
plot = text_df.positive_score[text_df.before==1].plot.hist(
alpha=0.3, label='Before Covid', legend=True,
histtype='stepfilled', edgecolor='black')
text_df.positive_score[text_df.before==0].plot.hist(
alpha=0.3, label='After Covid', legend=True,
histtype='stepfilled', edgecolor='black',
ax=plot)
<AxesSubplot:ylabel='Frequency'>
The number of the posts with higher positivity scores, which are posts that are more obviously positive, decreased after covid.
plot = text_df.compound_score[text_df.before==1].plot.hist(
alpha=0.3, label='Before Covid', legend=True,
histtype='stepfilled', edgecolor='black')
text_df.compound_score[text_df.before==0].plot.hist(
alpha=0.3, label='After Covid', legend=True,
histtype='stepfilled', edgecolor='black',
ax=plot)
<AxesSubplot:ylabel='Frequency'>
The distribution of the compound scores of the posts proves again that there were more positive posts and less negative posts before covid
# Make sentiments analysis decisions on each text based on the Vader scores
decisionsTotal = []
for row in text_df['compound_score']:
if row >= 0.05:
decisionsTotal.append("positive");
elif row <= -0.05:
decisionsTotal.append("negative");
else:
decisionsTotal.append("neutrual");
decisionsTotal = {'results': decisionsTotal}
sentiment_results = pd.DataFrame.from_dict(decisionsTotal)
text_df = pd.concat([text_df, sentiment_results], axis=1)
# Cross platforms analysis
# Comparison between Twitter and Reddit, to show how the number of the posts under each sentiment category change
# before and after covid on these two platforms
import seaborn as sns
compare = pd.DataFrame(text_df.groupby(['after covid','platform','results']).count())
compare.reset_index(level=0,inplace=True)
compare.reset_index(level=0,inplace=True)
compare.reset_index(level=0,inplace=True)
sns.catplot(x='results',y = 'text_id',data = compare,hue = 'after covid',row ='platform',kind = 'bar')
<seaborn.axisgrid.FacetGrid at 0x7f8daecee430>
We can observe from these charts that the numbers of posts under each sentiment category do not seem to change much before and after covid. On the other hand, on Twitter, negative posts increased a lot, while positive and neutral posts reduced a lot, after covid. Based on the results, the changes of the numbers show different tendencies on the two platforms. Considering that different platforms have their own ways of dealing with hate contents, i.e. online community policy, one assumption is that Reddit might have done significant moderating work to eliminate the posts that are deemed to be hate contents.
!pip install liwc==0.5.0
Collecting liwc==0.5.0 Downloading liwc-0.5.0-py2.py3-none-any.whl (5.1 kB) Installing collected packages: liwc Successfully installed liwc-0.5.0
# import the liwc dictionary
import liwc
# sed -i -e '/[<(]/d' LIWC2007_English100131.dic
parse, category_names = liwc.load_token_parser('LIWC2007_English100131.dic')
from collections import Counter
features = ['negemo','anger','swear','posemo','affect','death','health']
# define a function to help classify categories
def isIn(x,emo):
if Counter(category for token in x.split() for category in parse(str(token)))[emo]>0:
return 1
return 0
# add categories to columns
for n in range(len(data)):
for col in features:
try:
data.at[n, col] = isIn(data.at[n, 'cleaned_text'],col)
except TypeError:
print(n)
data.to_csv('liwc look up results.csv', encoding = 'utf-8', index = False)
compare = pd.DataFrame(data.groupby(['after','platform'])[features[0]].sum())
compare.reset_index(level=0,inplace=True)
compare.reset_index(level=0,inplace=True)
compare['attitude']=['negemo']*4
compare.rename(columns = {'negemo':'count'},inplace = True)
def compute_attitude(compare,emo):
extra = pd.DataFrame(data.groupby(['after','platform'])[emo].sum())
extra.reset_index(level=0,inplace=True)
extra.reset_index(level=0,inplace=True)
extra.rename(columns = {emo:'count'},inplace = True)
extra['attitude']=[emo]*4
return pd.concat([compare,extra],ignore_index=True)
for f in features:
compare = compute_attitude(compare,f)
# compare = pd.read_csv('compare.csv')
sns.set_style("whitegrid")
g = sns.catplot(x='attitude',y = 'count',data = compare,
hue = 'after',row ='platform',kind = 'bar',palette='BuPu')
g.legend.set_bbox_to_anchor((.9, .9))
g.savefig('liwc.png')
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
# Import the tweets data
tweets_df = pd.read_csv('tweets.csv')
# Import the sentiment data
sentiment_df = pd.read_csv('sentiment_results.csv')
# Find the most negative tweets
sentiment_df = sentiment_df.sort_values(by=['compound_score'])
# Pick negative tweets
negative_texts = sentiment_df[sentiment_df["sentiment_results"] == 'negative']
negative_texts = negative_texts[negative_texts["platform"] == 'twitter']
# Pick top favorite tweets
favorite = tweets_df.astype({"favorite_count": float})
favorite = favorite.sort_values(by=['favorite_count'], ascending=False)
favorite = favorite.iloc[0:1890, :]
import matplotlib.pyplot as plt
from matplotlib_venn import venn3, venn3_circles
from matplotlib_venn import venn2, venn2_circles
# Find the intersection
negative_texts = negative_texts.astype({"text_id": float})
favorite = favorite.astype({"id": float})
# Venn diagram to see if there is intersection
d1 = negative_texts['text_id'].tolist()
d2 = favorite['id'].tolist()
venn2([set(d1), set(d2)])
plt.show()
We can see from the results that on Twitter, out of all 1890 negative tweets, 950 tweets are most favorite, which is about 50.26%.
# Pick top commented tweets
reply = tweets_df.astype({"reply_count": float})
reply = reply.sort_values(by=['reply_count'], ascending=False)
reply = reply.iloc[0:1890, :]
# Find the intersection
reply = reply.astype({"id": float})
# Venn diagram to see if there is intersection
d3 = reply['id'].tolist()
venn2([set(d1), set(d3)])
plt.show()
We can see from the results that out of all 1890 negative tweets, 961 tweets are most favorite, which is about 50.85%.
# Load the data
sub_senti = pd.read_csv('sub_senti.csv')
sub_senti = sub_senti.dropna()
# Get the top most negative submission
top200Neg = sub_senti.sort_values(by='compound_score').head(200)
# Get the top highest scored submission
top200Score = sub_senti.sort_values(by='score', ascending=False).head(200)
# See the intersection
negset = set(top200Neg.text_id)
scoreset = set(top200Score.text_id)
common = negset.intersection(scoreset)
print('Proportion of common redditors:', len(common)*100/200, '%')
Between Top 200 most negative submission and Top 200 submissions with highest karma scores, only 7.5% is overlapped, which is quite different from the Twitter finding.
# Analysis the upvote/downvote numbers of the redit submissions
redditSentiments = text_df[text_df["platform"] == "reddit"]
# Import the original reddit datasets that contains the upvotes and downvotes information
votes_df = pd.read_csv('reddit_submission_sampled_phase2.csv')
# Use "Upvote - Downvote" as the socre to see how users responses to the contents of different sentiment categories on Reddit
votesAnalysis = pd.concat([redditSentiments, votes_df['score']], axis=1)
g = sns.catplot(x="after covid", y="score", col="results",
data=votesAnalysis, saturation=.5,
kind="bar", ci=None, aspect=.6)
(g.set_axis_labels("", "score")
.set_xticklabels(["before covid", "after covid"])
.set_titles("{col_name} {col_var}")
.set(ylim=(0, 70))
.despine(left=True))
Above are the bar charts of the average scores of the three sentiment categories, before and after covid. Note that in these charts, the tweaking work has been done which removed the outliers data points. We can tell from the charts that, while people do not endorse negative posts as much after covid, people were much more willing to endorse the positive and neutral posts before covid. This finding, again, seems quite different from the finding of Twitter.
# in case the text will be truncated
pd.set_option('display.max_colwidth', 0)
pd.set_option('display.max_rows',99)
# combine the data
sentiment_analysis = pd.read_csv("sentiment_results.csv")
tweets = pd.read_csv('Twitter-combined/tweets.csv')
users = pd.read_csv('Twitter-combined/user_twitter.csv')
# filter out the sentiment scores of tweets
senti = sentiment_analysis[sentiment_analysis['platform']=='twitter']
# prepare the datasets for concatecation
users.rename(columns = {'created_at':'user_created_at'},inplace = True)
users.drop(columns = ['user_id','before'],inplace =True)
tweets.rename(columns = {'id':'text_id'},inplace=True)
# join the two dataframes
# and change all the ids to object type
data_to_use = pd.concat([tweets,users],axis = 1).astype(
{'in_reply_to_status_id':'object','text_id': 'object','user_id':'object',
'in_reply_to_user_id':'object'})
# drop the redundant columns
senti.drop(columns=['text_id','text', 'dateTime', 'platform','after covid'],inplace = True)
senti.reset_index(drop=True,inplace=True)
# combine sentiment results and tweets features
senti_to_use = pd.concat([data_to_use,senti],axis = 1)
# the most common sources of the tweets
t = senti_to_use.groupby(['after','source']).size().reset_index(name = 'count')
f, ax = plt.subplots(figsize=(10,5))
sns.barplot(data = t[t['count']>10].sort_values(by ='count',ascending = False), x = 'source', y = 'count', hue = 'after',palette='BuPu')
ax.set(title = 'the most common source of the tweets')
/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/frame.py:4315: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy errors=errors,
[Text(0.5, 1.0, 'the most common source of the tweets')]
Most of the tweets came from Android users. After the COVID, there are more iPhone Users and Web App users detected.
# rank out the top 200 tweets with the lowest compound score
top_low_compound = data.sort_values(by='compound_score',ascending=True).head(200)
# check duplicate texts
print('number of duplicate text entries in the top low compound sentiment score:',top_low_compound.text.duplicated().sum())
low_comp_duplicates = top_low_compound[top_low_compound.text.duplicated(keep='last')]
low_comp_duplicates[['after','text']]
number of duplicate text entries in the top low compound sentiment score: 99
| after | text | |
|---|---|---|
| 1937 | 0 | @DailyMirror @rickygervais #China is the most evil place on the planet. Intentionally terrorizing, torturing, dogs, cats, animals. They're missing a soul, evil demonic zombies May all dog farmers, butchers, dogmeat eaters die the same way as the dogs... #FuckChina End #DogMeat |
| 1152 | 0 | @shishnfips @argentomaris1 #BOYCOTTCHINA\n#SHAMEONYOU #CHINA\nThey kill animals with the most cruel tortures because they think the pain will make their meat more "delicious"!!!\nLet us support the good people in China who are fighting against this brutality! |
| 1111 | 0 | @shishnfips @argentomaris1 #BOYCOTTCHINA\n#SHAMEONYOU #CHINA\nThey kill animals with the most cruel tortures because they think the pain will make their meat more "delicious"!!!\nLet us support the good people in China who are fighting against this brutality! |
| 1138 | 0 | @shishnfips @argentomaris1 #BOYCOTTCHINA\n#SHAMEONYOU #CHINA\nThey kill animals with the most cruel tortures because they think the pain will make their meat more "delicious"!!!\nLet us support the good people in China who are fighting against this brutality! |
| 1137 | 0 | @shishnfips @argentomaris1 #BOYCOTTCHINA\n#SHAMEONYOU #CHINA\nThey kill animals with the most cruel tortures because they think the pain will make their meat more "delicious"!!!\nLet us support the good people in China who are fighting against this brutality! |
| 1130 | 0 | @shishnfips @argentomaris1 #BOYCOTTCHINA\n#SHAMEONYOU #CHINA\nThey kill animals with the most cruel tortures because they think the pain will make their meat more "delicious"!!!\nLet us support the good people in China who are fighting against this brutality! |
| 1143 | 0 | @shishnfips @argentomaris1 #BOYCOTTCHINA\n#SHAMEONYOU #CHINA\nThey kill animals with the most cruel tortures because they think the pain will make their meat more "delicious"!!!\nLet us support the good people in China who are fighting against this brutality! |
| 3419 | 1 | @sortbinue It was from china, but *trumps incompetence, and lies has changed that.\n\nIt is now the #TrumpVirus ,\nIt is through his inaction, and Republicans that the virus mutated and poses a bigger threat to the World. \n#TrumpLiesPeopleDie https://t.co/2xs1tiN2rf |
| 3216 | 1 | @sortbinue It was from china, but *trumps incompetence, and lies has changed that.\n\nIt is now the #TrumpVirus ,\nIt is through his inaction, and Republicans that the virus mutated and poses a bigger threat to the World. \n#TrumpLiesPeopleDie https://t.co/2xs1tiN2rf |
| 1641 | 0 | #StopYulin #StopTheTorture #China This is cruel evilness & must end! #BoycottChina until they stop this horror! #YulinDogMeatFestival\n#Yulin Festival China: How Dogs & Cats Are #Tortured To Death “To Make Meat Tastier” https://t.co/qdaWRIMnCG via @AnonHQ |
| 1668 | 0 | #StopYulin #StopTheTorture #China This is cruel evilness & must end! #BoycottChina until they stop this horror! #YulinDogMeatFestival\n#Yulin Festival China: How Dogs & Cats Are #Tortured To Death “To Make Meat Tastier” https://t.co/qdaWRIMnCG via @AnonHQ |
| 1074 | 0 | @realDonaldTrump How about you tell China to stop torturing, killing and eating dogs? We should have no dealings with them until the my stop their government sanctioned torture of animals. The Yulin Dog Meat Festival is happening RIGHT NOW!!! #BoycottChina #StopYulin |
| 3837 | 1 | @realDonaldTrump The virus came from China true but the tremendous damage done to the US has been due to your horrible incompetence. |
| 3269 | 1 | C-Virus circulated in EU (Spain Germany France Italy) in NOV & DEC 2019 - so some egghead scientists say. It starts making sense. That is when almost everyone got sick, some only little bit sick 1 week, some badly sick 6 weeks. THEN we heard of Virus. Wuhan went DARK in OCT 2019. |
| 1473 | 0 | @Change @LisaVanderpump @RepHastingsFL Signed previously and YES, The WHOLE W🌎RLD Wants This Hell, Horror and Heartbreak To End Already!!!\nSend Emails To @realDonaldTrump👇\nhttps://t.co/WYzQGtlK7D\n#ChinaTradeDeal\n#YulinDogMeatFestival \n#STOPYULIN\n#China\n#EndDogMeatTrade\n#BoycottChina https://t.co/rx2R7A9XWf |
| 1496 | 0 | @Change @LisaVanderpump @RepHastingsFL Signed previously and YES, The WHOLE W🌎RLD Wants This Hell, Horror and Heartbreak To End Already!!!\nSend Emails To @realDonaldTrump👇\nhttps://t.co/WYzQGtlK7D\n#ChinaTradeDeal\n#YulinDogMeatFestival \n#STOPYULIN\n#China\n#EndDogMeatTrade\n#BoycottChina https://t.co/rx2R7A9XWf |
| 3276 | 1 | @mommamia1217 He is busy being pissed at China for his lack of response to the virus here. Always looking to blame someone else for his failures. |
| 3774 | 1 | Obama says he ‘doesn’t want to live in Trump’s America’ and that he’s ‘pissed off’ by COVID being called ‘kung flu’ – The US Sun https://t.co/Wg26K0ZieG 🔥 bye bitch |
| 3569 | 1 | Obama says he ‘doesn’t want to live in Trump’s America’ and that he’s ‘pissed off’ by COVID being called ‘kung flu’ – The US Sun https://t.co/Wg26K0ZieG 🔥 bye bitch |
| 3719 | 1 | Obama says he ‘doesn’t want to live in Trump’s America’ and that he’s ‘pissed off’ by COVID being called ‘kung flu’ – The US Sun https://t.co/Wg26K0ZieG 🔥 bye bitch |
| 1756 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1836 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1834 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1874 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1154 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1770 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1766 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1778 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1715 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 3274 | 1 | @realDonaldTrump Blaming China is just blaming the 1st victim of this virus. Almost everywhere else has or is getting this under control. Only Trump was too incompetent.\n\n#Biden2020 |
| 1802 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1722 | 0 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC |
| 1990 | 0 | Omg this poor dog! Bless this man 💔 #StopYulin #BoycottChina it's unbelievable that they are allowing this insane torture and cruelty! #BanDCMT \nhttps://t.co/yvfNX1YZnC |
| 1622 | 0 | Omg this poor dog! Bless this man 💔 #StopYulin #BoycottChina it's unbelievable that they are allowing this insane torture and cruelty! #BanDCMT \nhttps://t.co/yvfNX1YZnC |
| 1961 | 0 | Omg this poor dog! Bless this man 💔 #StopYulin #BoycottChina it's unbelievable that they are allowing this insane torture and cruelty! #BanDCMT \nhttps://t.co/yvfNX1YZnC |
| 1488 | 0 | 🇨🇳China does not deserve to host #WorldDogShow2019 In #China criminal gangs run #DogCatMeatTrade where citizens #pets #dogs \n#cats stolen, killed & eaten corrupt #Chinese officials profit from it \n#XiJinping could stop the slaughter he wont. #BoycottChina #madeinchina\n#WDS 👿👎 https://t.co/Y3li2rPVTH |
| 3345 | 1 | Scientists in China say they discovered a new virus in pigs that has "pandemic potential." The virus is a strain of the H1N1 swine flu and has infected workers at pig farms.\n\nHowever, they say the immediate risk is low and there is no evidence of human-to-human transmission. https://t.co/2sJjQhFCjc |
| 3740 | 1 | Scientists in China say they discovered a new virus in pigs that has "pandemic potential." The virus is a strain of the H1N1 swine flu and has infected workers at pig farms.\n\nHowever, they say the immediate risk is low and there is no evidence of human-to-human transmission. https://t.co/2sJjQhFCjc |
| 3551 | 1 | Scientists in China say they discovered a new virus in pigs that has "pandemic potential." The virus is a strain of the H1N1 swine flu and has infected workers at pig farms.\n\nHowever, they say the immediate risk is low and there is no evidence of human-to-human transmission. https://t.co/2sJjQhFCjc |
| 3147 | 1 | Scientists in China say they discovered a new virus in pigs that has "pandemic potential." The virus is a strain of the H1N1 swine flu and has infected workers at pig farms.\n\nHowever, they say the immediate risk is low and there is no evidence of human-to-human transmission. https://t.co/2sJjQhFCjc |
| 3060 | 1 | Scientists in China say they discovered a new virus in pigs that has "pandemic potential." The virus is a strain of the H1N1 swine flu and has infected workers at pig farms.\n\nHowever, they say the immediate risk is low and there is no evidence of human-to-human transmission. https://t.co/2sJjQhFCjc |
| 3343 | 1 | Scientists in China say they discovered a new virus in pigs that has "pandemic potential." The virus is a strain of the H1N1 swine flu and has infected workers at pig farms.\n\nHowever, they say the immediate risk is low and there is no evidence of human-to-human transmission. https://t.co/2sJjQhFCjc |
| 3658 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3495 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3895 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3896 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3373 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3860 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3456 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3924 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3585 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3227 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3899 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3681 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3923 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3478 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3873 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3993 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3531 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3855 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3981 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3876 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3430 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3557 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3979 | 1 | Joe Biden lost for president in 1988 & 2008\n \nJoe Biden lost millions of manufacturing jobs to Mexico & China\n \nJoe Biden lost veteran lives waiting for care at the VA\n \nJoe Biden lost our respect when he opposed killing Osama bin Laden\n\nJoe Biden even lost in 2009 H1N1 Swine Flu https://t.co/3nD8OX45tO |
| 3891 | 1 | @RepSwalwell The entire world had to deal with the same virus. Nobody dealt with it as badly as Trump. Nobody let it crash their economy like Trump. Nobody let it kill their citizens like Trump. Whether China released it on purpose or it occurred naturally, it exposed Trump’s incompetence. |
| 3126 | 1 | @RepSwalwell The entire world had to deal with the same virus. Nobody dealt with it as badly as Trump. Nobody let it crash their economy like Trump. Nobody let it kill their citizens like Trump. Whether China released it on purpose or it occurred naturally, it exposed Trump’s incompetence. |
| 3670 | 1 | @RepSwalwell The entire world had to deal with the same virus. Nobody dealt with it as badly as Trump. Nobody let it crash their economy like Trump. Nobody let it kill their citizens like Trump. Whether China released it on purpose or it occurred naturally, it exposed Trump’s incompetence. |
| 1214 | 0 | First it was African Swine Fever plaguing China’s hogs - then an outbreak of Avian Flu (bird flu) - now a “seveer” army-worm infestation killing crops..\n\nChina can’t catch a break.\n\nhttps://t.co/75l9dODGal |
| 1558 | 0 | First it was African Swine Fever plaguing China’s hogs - then an outbreak of Avian Flu (bird flu) - now a “seveer” army-worm infestation killing crops..\n\nChina can’t catch a break.\n\nhttps://t.co/75l9dODGal |
| 1576 | 0 | #EndYulin #BanDCMT #BoycottChina if you think this protracted torture & murder can be excused for any reason-you are living in denial💔💔💔🐾@rissalipstick @GrouciDjamila @Cat_Kapow @AleZ2016 https://t.co/VdRsQmtYbx |
| 3479 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3015 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3685 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3013 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3070 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3938 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3319 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3052 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3470 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3203 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3890 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3894 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3069 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3937 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3718 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3109 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3643 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3112 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3009 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3581 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3437 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3537 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3858 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3341 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3634 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3098 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3751 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
| 3153 | 1 | No matter how much @JoeBiden trashes @realDonaldTrump’s response to the #coronavirus, Biden blasted Trump when, at the end of January, Trump restricted travel from China, the source of the virus. Biden said it reflected Trump’s “hysteria, xenophobia and fear mongering.” |
# check duplicate users
print('There are',top_low_compound.user_id.duplicated().sum(),'users that have several posts listed with lowest compound sentiment score.')
top_low_compound[top_low_compound.user_id.duplicated(keep=False)==True][['text','after','location','user_id']]
There are 4 users that have several posts listed with lowest compound sentiment score. /shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
| text | after | location | user_id | |
|---|---|---|---|---|
| 3419 | @sortbinue It was from china, but *trumps incompetence, and lies has changed that.\n\nIt is now the #TrumpVirus ,\nIt is through his inaction, and Republicans that the virus mutated and poses a bigger threat to the World. \n#TrumpLiesPeopleDie https://t.co/2xs1tiN2rf | 1 | Middle of the USA, and glad | 1075887974901006336 |
| 3423 | @sortbinue It was from china, but *trumps incompetence, and lies has changed that.\n\nIt is now the #TrumpVirus ,\nIt is through his inaction, and Republicans that the virus mutated and poses a bigger threat to the World. \n#TrumpLiesPeopleDie https://t.co/2xs1tiN2rf | 1 | Middle of the USA, and glad | 1075887974901006336 |
| 1668 | #StopYulin #StopTheTorture #China This is cruel evilness & must end! #BoycottChina until they stop this horror! #YulinDogMeatFestival\n#Yulin Festival China: How Dogs & Cats Are #Tortured To Death “To Make Meat Tastier” https://t.co/qdaWRIMnCG via @AnonHQ | 0 | California, USA | 1397215910 |
| 1791 | @Smaulgld China has much worse problems like needing to buy food after massive flooding has killed a lot of their crops lately. Plus, the swine flu, bird flu and army worm attacking their grains. Also, the monetary inflation there. Higher food prices coming soon. | 0 | DC/Northern VA | 85016689 |
| 1496 | @Change @LisaVanderpump @RepHastingsFL Signed previously and YES, The WHOLE W🌎RLD Wants This Hell, Horror and Heartbreak To End Already!!!\nSend Emails To @realDonaldTrump👇\nhttps://t.co/WYzQGtlK7D\n#ChinaTradeDeal\n#YulinDogMeatFestival \n#STOPYULIN\n#China\n#EndDogMeatTrade\n#BoycottChina https://t.co/rx2R7A9XWf | 0 | Taured | 2319245998 |
| 1874 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC | 0 | DC/Northern VA | 85016689 |
| 1990 | Omg this poor dog! Bless this man 💔 #StopYulin #BoycottChina it's unbelievable that they are allowing this insane torture and cruelty! #BanDCMT \nhttps://t.co/yvfNX1YZnC | 0 | California, USA | 1397215910 |
| 1489 | 🇨🇳China does not deserve to host #WorldDogShow2019 In #China criminal gangs run #DogCatMeatTrade where citizens #pets #dogs \n#cats stolen, killed & eaten corrupt #Chinese officials profit from it \n#XiJinping could stop the slaughter he wont. #BoycottChina #madeinchina\n#WDS 👿👎 https://t.co/Y3li2rPVTH | 0 | Taured | 2319245998 |
# the locations of users who have several tweets listed in the list
print('There are',top_low_compound.user_id.duplicated().sum(),'users that have several posts listed with lowest compound sentiment score.')
top_low_compound[top_low_compound.user_id.duplicated(keep=False)==True][['text','after','location','user_id']]
There are 4 users that have several posts listed with highest negative sentiment score. /shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
| text | after | location | user_id | |
|---|---|---|---|---|
| 3419 | @sortbinue It was from china, but *trumps incompetence, and lies has changed that.\n\nIt is now the #TrumpVirus ,\nIt is through his inaction, and Republicans that the virus mutated and poses a bigger threat to the World. \n#TrumpLiesPeopleDie https://t.co/2xs1tiN2rf | 1 | Middle of the USA, and glad | 1075887974901006336 |
| 3423 | @sortbinue It was from china, but *trumps incompetence, and lies has changed that.\n\nIt is now the #TrumpVirus ,\nIt is through his inaction, and Republicans that the virus mutated and poses a bigger threat to the World. \n#TrumpLiesPeopleDie https://t.co/2xs1tiN2rf | 1 | Middle of the USA, and glad | 1075887974901006336 |
| 1668 | #StopYulin #StopTheTorture #China This is cruel evilness & must end! #BoycottChina until they stop this horror! #YulinDogMeatFestival\n#Yulin Festival China: How Dogs & Cats Are #Tortured To Death “To Make Meat Tastier” https://t.co/qdaWRIMnCG via @AnonHQ | 0 | California, USA | 1397215910 |
| 1791 | @Smaulgld China has much worse problems like needing to buy food after massive flooding has killed a lot of their crops lately. Plus, the swine flu, bird flu and army worm attacking their grains. Also, the monetary inflation there. Higher food prices coming soon. | 0 | DC/Northern VA | 85016689 |
| 1496 | @Change @LisaVanderpump @RepHastingsFL Signed previously and YES, The WHOLE W🌎RLD Wants This Hell, Horror and Heartbreak To End Already!!!\nSend Emails To @realDonaldTrump👇\nhttps://t.co/WYzQGtlK7D\n#ChinaTradeDeal\n#YulinDogMeatFestival \n#STOPYULIN\n#China\n#EndDogMeatTrade\n#BoycottChina https://t.co/rx2R7A9XWf | 0 | Taured | 2319245998 |
| 1874 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC | 0 | DC/Northern VA | 85016689 |
| 1990 | Omg this poor dog! Bless this man 💔 #StopYulin #BoycottChina it's unbelievable that they are allowing this insane torture and cruelty! #BanDCMT \nhttps://t.co/yvfNX1YZnC | 0 | California, USA | 1397215910 |
| 1489 | 🇨🇳China does not deserve to host #WorldDogShow2019 In #China criminal gangs run #DogCatMeatTrade where citizens #pets #dogs \n#cats stolen, killed & eaten corrupt #Chinese officials profit from it \n#XiJinping could stop the slaughter he wont. #BoycottChina #madeinchina\n#WDS 👿👎 https://t.co/Y3li2rPVTH | 0 | Taured | 2319245998 |
def func(top_low_compound):
print(len(top_low_compound.location.unique()), "distinct user location of the 200 entries.")
print(pd.DataFrame(top_low_compound.location.value_counts()).iloc[:10])
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
# the locations of duplicate texts and overall tweets with the lowest compound score
func(low_comp_duplicates)
func(top_low_compound)
53 distinct user location of the 200 entries.
location
United States 7
Texas, USA 2
Florida, USA 2
California, USA 2
England 1
New Jersey, USA 1
Ocho Rios, Jamaica 1
纽约 1
Santa Fe, TX 1
Edinboro, PA 1
114 distinct user location of the 200 entries.
location
United States 7
California, USA 2
DC/Northern VA 2
New Jersey, USA 2
London, England 2
Florida, USA 2
Texas, USA 2
India 2
Middle of the USA, and glad 2
Taured 2
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
The user location information is not reliable. But we still want to take a look at that. Most of the locations we found are from the USA.
def plot_compare_user(top_low_compound):
# groupby each user and compute the average negative_score of each user
comp_compare = top_low_compound.groupby('user_id').mean('compound_score')
fig, axes = plt.subplots(nrows=2, ncols=3,figsize = (15,6),constrained_layout=True)
sns.scatterplot(data = comp_compare,x = 'compound_score', y = 'followers_count', ax=axes[0][0],color = 'b',label = '# of followers')
sns.scatterplot(data = comp_compare,x = 'compound_score', y = 'favourites_count', ax=axes[0][1],color = 'r',label = '# of favorites')
sns.scatterplot(data = comp_compare,x = 'compound_score', y = 'friends_count', ax=axes[0][2],color = 'g',label = '# of friends')
sns.scatterplot(data = comp_compare,x = 'compound_score', y = 'retweet_count', ax=axes[1][0],color = 'orange',label = 'retweet_count')
sns.scatterplot(data = comp_compare,x = 'compound_score', y = 'favorite_count', ax=axes[1][1],color = 'grey',label = 'favorite_count')
sns.scatterplot(data = comp_compare,x = 'compound_score', y = 'reply_count', ax=axes[1][2],color = 'purple',label = 'reply_count')
plt.show()
plot_compare_user(top_low_compound)
# take a look at the outliers
top_low_compound.sort_values(by = 'followers_count',ascending=False)[['text','compound_score','after','name','location']].head(1)
| text | compound_score | after | name | location | |
|---|---|---|---|---|---|
| 1836 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC | -0.891 | 0 | Santiago Capital | San Francisco |
top_low_compound.sort_values(by = 'favorite_count',ascending=False)[['text','compound_score','after','name','location']].head(1)
| text | compound_score | after | name | location | |
|---|---|---|---|---|---|
| 1874 | #China has increasing amounts of flooding, same as the Midwest in the US, which will hurt their grain crops. Add this to the African swine flu, bird flu and army worm attacking their grains and you have a developing #foodcrisis. Higher food prices coming. https://t.co/sNGFFJtOxC | -0.891 | 0 | Jason Burack | DC/Northern VA |
The most popular tweet is not really against Asian.
topics:
# get the top 200 tweets with the highest compound score
top_high_compound = data.sort_values(by='compound_score',ascending=False).head(200)
# check the duplicate
print('number of duplicate text entries in the top high compound sentiment score:',top_high_compound.text.duplicated().sum())
func(high_comp_duplicates)
high_comp_duplicates = top_high_compound[top_high_compound.text.duplicated(keep='last')]
high_comp_duplicates[['after','text']]
number of duplicate text entries in the top high compound sentiment score: 88
65 distinct user location of the 200 entries.
location
Italia 2
🌎 1
Devizes, England 1
Washington DC 1
Nova Scotia, Canada 1
planet Earth 1
Ottawa, Ontario, Canada 1
South West, England 1
Cardiff, Wales 1
Taured 1
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
| after | text | |
|---|---|---|
| 2680 | 1 | RT @Jack27688344: #OneCountryTwoSystems 1997 - 2020 🕯\nBut #HongKongers won't Stop to fight !!\nWE WANT FREEDOM and FREEDOM Is WHAT Will WE G… |
| 2987 | 1 | RT @Jack27688344: #OneCountryTwoSystems 1997 - 2020 🕯\nBut #HongKongers won't Stop to fight !!\nWE WANT FREEDOM and FREEDOM Is WHAT Will WE G… |
| 2985 | 1 | RT @Jack27688344: #OneCountryTwoSystems 1997 - 2020 🕯\nBut #HongKongers won't Stop to fight !!\nWE WANT FREEDOM and FREEDOM Is WHAT Will WE G… |
| 2399 | 1 | RT @Jack27688344: #OneCountryTwoSystems 1997 - 2020 🕯\nBut #HongKongers won't Stop to fight !!\nWE WANT FREEDOM and FREEDOM Is WHAT Will WE G… |
| 1644 | 0 | @spaqx @ola_dayo_Virus @Biyatife @Yincar @Muwafaqd @SaharaReporters the people of Russia is happy with Putin ,they call him "pride of Russia ",so as Dubai, Qatar, China,Israel, our own close African brother Rwandan they're all performing well and their people are happy with them. |
| 1762 | 0 | Is it also great for these Muslim leaders, if China burns the Islamic holy book, the Qur’an? Is it also great for them to accept China’s perception that Indonesia is a country infested with an Islamic virus among other 27 (Muslim) countries? #Indonesia \nhttps://t.co/tMVPP02hed |
| 1868 | 0 | Is it also great for these Muslim leaders, if China burns the Islamic holy book, the Qur’an? Is it also great for them to accept China’s perception that Indonesia is a country infested with an Islamic virus among other 27 (Muslim) countries? #Indonesia \nhttps://t.co/tMVPP02hed |
| 1871 | 0 | Is it also great for these Muslim leaders, if China burns the Islamic holy book, the Qur’an? Is it also great for them to accept China’s perception that Indonesia is a country infested with an Islamic virus among other 27 (Muslim) countries? #Indonesia \nhttps://t.co/tMVPP02hed |
| 1885 | 0 | Is it also great for these Muslim leaders, if China burns the Islamic holy book, the Qur’an? Is it also great for them to accept China’s perception that Indonesia is a country infested with an Islamic virus among other 27 (Muslim) countries? #Indonesia \nhttps://t.co/tMVPP02hed |
| 1934 | 0 | Is it also great for these Muslim leaders, if China burns the Islamic holy book, the Qur’an? Is it also great for them to accept China’s perception that Indonesia is a country infested with an Islamic virus among other 27 (Muslim) countries? #Indonesia \nhttps://t.co/tMVPP02hed |
| 1935 | 0 | Is it also great for these Muslim leaders, if China burns the Islamic holy book, the Qur’an? Is it also great for them to accept China’s perception that Indonesia is a country infested with an Islamic virus among other 27 (Muslim) countries? #Indonesia \nhttps://t.co/tMVPP02hed |
| 1969 | 0 | Is it also great for these Muslim leaders, if China burns the Islamic holy book, the Qur’an? Is it also great for them to accept China’s perception that Indonesia is a country infested with an Islamic virus among other 27 (Muslim) countries? #Indonesia \nhttps://t.co/tMVPP02hed |
| 1933 | 0 | Is it also great for these Muslim leaders, if China burns the Islamic holy book, the Qur’an? Is it also great for them to accept China’s perception that Indonesia is a country infested with an Islamic virus among other 27 (Muslim) countries? #Indonesia \nhttps://t.co/tMVPP02hed |
| 1960 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 1978 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 1996 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 1957 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 1987 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 1981 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 1932 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 1942 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 1948 | 0 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc |
| 3191 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3110 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3125 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3087 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3095 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3107 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3064 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3115 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3034 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3116 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3136 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3028 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3144 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3170 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3004 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3175 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3193 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 3207 | 1 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT |
| 621 | 0 | RT @BradfordLitFest: When they call you a Paki tell them oh have you been, I hear it is a beautiful country - a powerful poem by @nadineais… |
| 1491 | 0 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm |
| 1035 | 0 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm |
| 1018 | 0 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm |
| 1468 | 0 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm |
| 1477 | 0 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm |
| 1485 | 0 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm |
| 1486 | 0 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm |
| 1404 | 0 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm |
| 1577 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1585 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1620 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1801 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1150 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1776 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1574 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1767 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1805 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1806 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1630 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1169 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1632 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1797 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1795 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1796 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1775 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1793 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1769 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1619 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1792 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1790 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1788 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1785 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1784 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1783 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1779 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1782 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1747 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1761 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1712 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1677 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1678 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1679 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1723 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1680 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1718 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1717 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
| 1681 | 0 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK |
# check the users who have several posts listed
print('There are',top_high_compound.user_id.duplicated().sum(),'users that have several posts listed with highest compound sentiment score.')
func(top_high_compound)
top_high_compound[top_high_compound.user_id.duplicated(keep = False)==True][['text','after','location','user_id']]
There are 3 users that have several posts listed with highest compound sentiment score.
136 distinct user location of the 200 entries.
location
Lagos, Nigeria 3
New Delhi, India 2
Italia 2
Cuttack, India 2
Indiana, USA 2
India 2
Hong Kong 2
Dorset, Uk 2
St Louis, MO 1
Global Citizen 1
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
| text | after | location | user_id | |
|---|---|---|---|---|
| 2090 | Rainbow Hoop Earrings - Statement Earrings - Shrinky Dink Jewellery - Colourful Gift - Letterbox Gifts - Fun Birthday Gift For Her. https://t.co/CEalJaM5CB #Etsy #Slumbermonkey #ShrinkDinks https://t.co/PGVJQz35QH | 1 | Dorset, Uk | 1174549344 |
| 974 | @sardesairajdeep Respect paki emotion and win. Comon india, england is rising from ashes. | 0 | Cuttack, India | 3541223066 |
| 956 | @News18India Respect paki emotion and win. Comon india, england is rising from ashes. | 0 | Cuttack, India | 3541223066 |
| 95 | Goldfinch Bird Pin, Yellow Bird Brooch, Gift For Nature Lover, Shrinky Dink Jewellery. https://t.co/BtdU2fH3Ok #Slumbermonkey #Etsy #GoldfinchBrooch https://t.co/PtMadj7aPK | 0 | Dorset, Uk | 1174549344 |
| 1476 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm | 0 | NaN | 2385290635 |
| 1577 | #YulinDogMeatFestival \nWhat kind of a world are we living in when our Governments want to trade with a country who burns dogs alive and laugh #BoycottChina #RT @nelufar @realDonaldTrump @eucopresident @theresa_may @MailOnline @HuffPostUK @thetimes @TelegraphNews @Channel4 https://t.co/7MrGnajEbK | 0 | NaN | 2385290635 |
plot_compare_user(top_high_compound)
# take a look at the outliers
top_high_compound.sort_values(by = 'followers_count',ascending=False)[['text','compound_score','after','name','location']].head(5)
| text | compound_score | after | name | location | |
|---|---|---|---|---|---|
| 739 | @Dink_lay I wish to express my sincere apology for the inconvenience Marisa. For security purposes, please verify: Confirmation Number, Full Name on the Reservation, Email Address & Flight Number. Please DM for privacy. ALS | 0.8658 | 0 | Delta | Global |
| 3207 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT | 0.8418 | 1 | Ezra Levant 🍁 | Canada and the world |
| 3144 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT | 0.8418 | 1 | Allum Bokhari | Washington, DC |
| 1999 | Rest in peace, dear dogs 💔we will never stop fighting for you! #StopYulin #BoycottChina 🚫 #YulinDogMeatFestival #StopYulinForever 🐾 https://t.co/ihT7VX95Cc | 0.8441 | 0 | Bean Counter | Charlotte NC area |
| 1491 | @Rehflocke @AnnelieMoser @michelarandy @dannahy_tina @Maureenhommagm1 @Princesse106 @endyulinfest1 @JustAnimals_ @MaryBittel @Cat_Kapow @VoiceofVietnam @Dikte17 @GrouciDjamila @jijmpel @Lanaroske1 @Nenagh9 @SandraK93322487 @Sally8229650811 @Barbann56 @YarosisNancy @ruthmen @DebbieMcqueen11 @oimaco8 @semperfiiam @PaulCBS12 @qa_angus @bergwolf12 @SusanDuncanolp @vsena007 @DogsofYulin @ChinaDaily @CGTNOfficial PLEASE DON'T FORGET !!!!!!\n\n#NeverForget #Coke \n\n#china HELL FOR ANIMALS IN THE EARTH\n\n#BOYCOTTCHINA\n\nBASTARDS!!!! 🤬👹💩 https://t.co/NlDZpUpUBm | 0.7670 | 0 | sígueme y te sigo | Taured |
top_high_compound.sort_values(by = 'retweet_count',ascending=False)[['text','compound_score','after','name','location']].head(5)
| text | compound_score | after | name | location | |
|---|---|---|---|---|---|
| 3207 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT | 0.8418 | 1 | Ezra Levant 🍁 | Canada and the world |
| 266 | At dinner on holiday and overheard an English guy saying proudly how his first two court experiences was for ‘gay bashing’ and ‘paki bashing’. 😪Please donate 👇 https://t.co/Z68kEAReOj | 0.8225 | 0 | Finlay McFarlane | South Queensferry, Scotland |
| 2392 | (luke thomas feverishly creating the backstory of a wario-style nemesis for robin black on his laptop) he will proclaim dink instead of bink, ha ha ha | 0.8126 | 1 | Hektic_One | Long Island, NY |
| 555 | @toy061707 @CynthiaJPatag Paki define nga kung ano pinanalo natin sa UNCLOS? After all the millions we spent! Please kindly explain and enlighten us what have we won. | 0.9100 | 0 | Dottie IC | London, England |
| 510 | @ paki fans, thanks for the sweet gesture but next time se sirf apni team ko support karna please ☺️ | 0.8573 | 0 | S. | India |
top_high_compound.sort_values(by = 'favorite_count',ascending=False)[['text','compound_score','after','name','location']].head(5)
| text | compound_score | after | name | location | |
|---|---|---|---|---|---|
| 3207 | My new book just went up on Amazon a few hours ago — I haven’t tweeted about it or emailed anyone or made a video yet. I was planning all that for Thursday. But it’s already a best-seller! \n\nI guess people really want to hear the other side of the story. https://t.co/0grhp5hjn0 https://t.co/3yP3zcqRzT | 0.8418 | 1 | Ezra Levant 🍁 | Canada and the world |
| 2389 | My heart 💓 thank you, I’m so happy to be part of the Dink Fam💓 https://t.co/LshpO3JrYB | 0.9089 | 1 | Hol 🌹⚡️ | Dink Fam |
| 2392 | (luke thomas feverishly creating the backstory of a wario-style nemesis for robin black on his laptop) he will proclaim dink instead of bink, ha ha ha | 0.8126 | 1 | Hektic_One | Long Island, NY |
| 2302 | so i just looked and saw i missed hitting exactly 200 dinks, but thank you all so much dink fam🥺💛. i have never been part of a more loving, supportive, and accepting community. y’all dont really know anything about me, so i was thinking of letting you— | 0.9438 | 1 | Kaitlynn | saturn🪐 |
| 266 | At dinner on holiday and overheard an English guy saying proudly how his first two court experiences was for ‘gay bashing’ and ‘paki bashing’. 😪Please donate 👇 https://t.co/Z68kEAReOj | 0.8225 | 0 | Finlay McFarlane | South Queensferry, Scotland |
The locations are more diverged in the high compound group.
overlapped topics: Hong Kong issues, Islam issues, YulinDogMeatFestival(animal protection)
Dink is not really an anti-Asian keyword. Most of the tweets with the highest compound score are about warm things happen in the dink group.
</font>
import numpy as np
compare = senti_to_use.drop(
columns=['text_id', 'text','user_id','name','cleaned_text','created_at','user_created_at','retweeted'])
def categorize(col):
compare[col] = compare[col].astype("category")
types = ['source','possibly_sensitive','filter_level','after']
for t in types:
categorize(t)
plt.figure(figsize=(16, 6))
data_plot= compare[compare['after']==0].drop(columns = ['after'])
mask = np.triu(np.ones_like(data_plot.corr(), dtype=np.bool))
plot1 = sns.heatmap(data_plot.corr(),mask=mask,vmin=-1, vmax=1, annot=True,cmap='BrBG')
plot1.set_title('Correlation Heatmap before Covid-19', fontdict={'fontsize':18}, pad=16);
plt.figure(figsize=(16, 6))
data_plot= compare[compare['after']==1].drop(columns = ['after'])
mask = np.triu(np.ones_like(data_plot.corr(), dtype=np.bool))
plot2 = sns.heatmap(data_plot.corr(),mask=mask,vmin=-1, vmax=1, annot=True,cmap='BrBG')
plot2.set_title('Correlation Heatmap after Covid-19', fontdict={'fontsize':18}, pad=16);
By comparing the correlation before and after the outbreak of COVID-19, there are some significant changes such as there are much less retweets and more users that have more favorites after the COVID.
However, the overall correlation is too weak to prove a statistical correlation between the sentiment scores and tweets features. So there is no significant correlation.
tweets before Covid, tweets after Covid, submission before Covid and submission after Covid, and to further explore topics on each platform and the changes over time. !pip install gensim==3.8.3
Collecting gensim==3.8.3
Downloading gensim-3.8.3-cp37-cp37m-manylinux1_x86_64.whl (24.2 MB)
|████████████████████████████████| 24.2 MB 25.3 MB/s
Requirement already satisfied: scipy>=0.18.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from gensim==3.8.3) (1.6.1)
Requirement already satisfied: numpy>=1.11.3 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from gensim==3.8.3) (1.19.5)
Requirement already satisfied: smart-open>=1.8.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from gensim==3.8.3) (3.0.0)
Requirement already satisfied: six>=1.5.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from gensim==3.8.3) (1.15.0)
Requirement already satisfied: requests in /shared-libs/python3.7/py/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim==3.8.3) (2.25.1)
Requirement already satisfied: certifi>=2017.4.17 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim==3.8.3) (2020.12.5)
Requirement already satisfied: chardet<5,>=3.0.2 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim==3.8.3) (3.0.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim==3.8.3) (1.26.3)
Requirement already satisfied: idna<3,>=2.5 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim==3.8.3) (2.10)
Installing collected packages: gensim
Successfully installed gensim-3.8.3
!pip install pyLDAvis==3.2.2
Collecting pyLDAvis==3.2.2
Downloading pyLDAvis-3.2.2.tar.gz (1.7 MB)
|████████████████████████████████| 1.7 MB 26.1 MB/s
Requirement already satisfied: wheel>=0.23.0 in /usr/local/lib/python3.7/site-packages (from pyLDAvis==3.2.2) (0.36.2)
Requirement already satisfied: numpy>=1.9.2 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pyLDAvis==3.2.2) (1.19.5)
Requirement already satisfied: scipy>=0.18.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pyLDAvis==3.2.2) (1.6.1)
Requirement already satisfied: joblib>=0.8.4 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pyLDAvis==3.2.2) (1.0.1)
Requirement already satisfied: jinja2>=2.7.2 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pyLDAvis==3.2.2) (2.11.3)
Collecting numexpr
Downloading numexpr-2.7.3-cp37-cp37m-manylinux2010_x86_64.whl (471 kB)
|████████████████████████████████| 471 kB 66.9 MB/s
Requirement already satisfied: future in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pyLDAvis==3.2.2) (0.18.2)
Collecting funcy
Downloading funcy-1.15-py2.py3-none-any.whl (32 kB)
Requirement already satisfied: pandas>=0.17.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pyLDAvis==3.2.2) (1.2.3)
Requirement already satisfied: MarkupSafe>=0.23 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from jinja2>=2.7.2->pyLDAvis==3.2.2) (1.1.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pandas>=0.17.0->pyLDAvis==3.2.2) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pandas>=0.17.0->pyLDAvis==3.2.2) (2021.1)
Requirement already satisfied: six>=1.5 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas>=0.17.0->pyLDAvis==3.2.2) (1.15.0)
Building wheels for collected packages: pyLDAvis
Building wheel for pyLDAvis (setup.py) ... done
Created wheel for pyLDAvis: filename=pyLDAvis-3.2.2-py2.py3-none-any.whl size=135593 sha256=19c97e8133b61c4475630d2658109ab5d5a63150f4846bd591cd0471a9b32c7f
Stored in directory: /root/.cache/pip/wheels/f8/b1/9b/560ac1931796b7303f7b517b949d2d31a4fbc512aad3b9f284
Successfully built pyLDAvis
Installing collected packages: numexpr, funcy, pyLDAvis
Successfully installed funcy-1.15 numexpr-2.7.3 pyLDAvis-3.2.2
WARNING: You are using pip version 20.1.1; however, version 21.0.1 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
# Import libraries
import nltk
nltk.download('stopwords')
import gensim, spacy
import gensim.corpora as corpora
from nltk.corpus import stopwords
from spacy.lang.en.stop_words import STOP_WORDS
import pandas as pd
import re
import time
import pyLDAvis
import pyLDAvis.gensim
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
%matplotlib inline
# Import the pickle
text_yj = pd.read_pickle('clean_tokens.pkl')
# Prepare the corpus
tokens = text_yj.tokens_cleaned.dropna()
dictionary = corpora.Dictionary(tokens)
doc_mtrx = [dictionary.doc2bow(text) for text in tokens]
# Build the LDA model
num_topics = 5
LDA_overall = gensim.models.ldamodel.LdaModel(doc_mtrx, num_topics = num_topics, id2word=dictionary,random_state=100)
LDA_overall.save('overall.model') # save model
LDA_overall.print_topics()
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code) [nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Package stopwords is already up-to-date!
[(0, '0.009*"paint" + 0.006*"rare" + 0.004*"wheel" + 0.004*"china" + 0.004*"time" + 0.003*"year" + 0.003*"people" + 0.003*"come" + 0.003*"like" + 0.003*"white"'), (1, '0.022*"china" + 0.014*"virus" + 0.011*"paki" + 0.008*"new" + 0.008*"flu" + 0.007*"country" + 0.007*"pandemic" + 0.006*"people" + 0.005*"world" + 0.005*"like"'), (2, '0.007*"paint" + 0.004*"rare" + 0.004*"go" + 0.004*"trump" + 0.004*"like" + 0.004*"wheel" + 0.003*"people" + 0.003*"say" + 0.003*"know" + 0.003*"time"'), (3, '0.011*"like" + 0.006*"time" + 0.005*"know" + 0.005*"want" + 0.005*"look" + 0.005*"go" + 0.005*"get" + 0.005*"feel" + 0.004*"say" + 0.004*"good"'), (4, '0.007*"paint" + 0.007*"crimson" + 0.006*"white" + 0.005*"rare" + 0.004*"blue" + 0.004*"sky" + 0.004*"purple" + 0.004*"black" + 0.004*"point" + 0.003*"saffron"')]
# Plot using pyLDAvis
pyLDAvis.enable_notebook() # enable on jupyterbook
plot = pyLDAvis.gensim.prepare(LDA_overall, doc_mtrx, dictionary)
# Save pyLDA plot as html file
pyLDAvis.save_html(plot, 'LDA_overall.html')
plot
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
like, want, know, want and look, which are commonly used and is hard to get the exact context. China, people, go, virus and new, which are associated with Coronavirus.# Split documents
# before covid - twitter
twitter_before = text_yj[(text_yj.before == 1) & (text_yj.platform == 'twitter')].dropna()
# before covid - reddit
reddit_before = text_yj[(text_yj.before == 1) & (text_yj.platform == 'reddit')].dropna()
# after covid - twitter
twitter_after = text_yj[(text_yj.before == 0) & (text_yj.platform == 'twitter')].dropna()
# after covid - reddit
reddit_after = text_yj[(text_yj.before == 0) & (text_yj.platform == 'reddit')].dropna()
# Prepare corpus - Twitter before Covid
dictionary = corpora.Dictionary(twitter_before.tokens_cleaned)
doc_mtrx = [dictionary.doc2bow(text) for text in twitter_before.tokens_cleaned]
num_topics = 5
LDA_twtB = gensim.models.ldamodel.LdaModel(doc_mtrx, num_topics = num_topics, id2word=dictionary,iterations=50, random_state=100)
LDA_twtB.save('twitterBeforeCovid.model') # save model
# Print top 10 topics
LDA_twtB.print_topics()
# Prepare corpus - Twitter after Covid
dictionary = corpora.Dictionary(twitter_after.tokens_cleaned)
doc_mtrx = [dictionary.doc2bow(text) for text in twitter_after.tokens_cleaned]
num_topics = 5
LDA_twtA = gensim.models.ldamodel.LdaModel(doc_mtrx, num_topics = num_topics, id2word=dictionary,iterations=50, random_state=100)
LDA_twtA.save('twitterAfterCovid.model') # save model
# Print top 10 topics
LDA_twtA.print_topics()
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
[(0, '0.032*"china" + 0.032*"virus" + 0.016*"new" + 0.016*"flu" + 0.015*"world" + 0.013*"go" + 0.013*"speak" + 0.013*"entire" + 0.013*"label" + 0.013*"completely"'), (1, '0.039*"virus" + 0.035*"china" + 0.022*"pandemic" + 0.019*"new" + 0.017*"people" + 0.016*"come" + 0.014*"travel" + 0.014*"like" + 0.013*"angry" + 0.013*"country"'), (2, '0.038*"china" + 0.027*"virus" + 0.020*"new" + 0.020*"flu" + 0.019*"biden" + 0.017*"lose" + 0.015*"swine" + 0.013*"joe" + 0.011*"pandemic" + 0.008*"potential"'), (3, '0.031*"china" + 0.022*"flu" + 0.018*"virus" + 0.018*"like" + 0.017*"paki" + 0.014*"trump" + 0.013*"mask" + 0.012*"retweet" + 0.012*"pandemic" + 0.011*"new"'), (4, '0.027*"paki" + 0.023*"dink" + 0.013*"china" + 0.012*"people" + 0.007*"property" + 0.006*"flu" + 0.005*"virus" + 0.005*"woman" + 0.005*"pandemic" + 0.005*"like"')]
# Helper functions to get the top 10 words with probabilities for each topic
def dict_topics(corpus):
dictionary = corpora.Dictionary(corpus)
doc_mtrx = [dictionary.doc2bow(text) for text in corpus]
num_topics = 5
LDA = gensim.models.ldamodel.LdaModel(doc_mtrx, num_topics = num_topics, id2word=dictionary,iterations=50, random_state=100)
top_topics = LDA.top_topics(doc_mtrx,topn=10)
alltopics = []
for i in range(len(top_topics)):
topics = dict(top_topics[i][0])
topics = dict([(value, key) for key, value in topics.items()])
alltopics.append(topics)
return alltopics
# Helper funciton to calculate the similarity score (probability aggregation)
def get_similarity_score(before, after):
tbeforeSet = set(before)
tafterSet = set(after)
common = tbeforeSet.intersection(tafterSet)
similarity_score = 0
for word in common:
weight_diff = abs(before[word] - after[word])
similarity_score = similarity_score + (1 - weight_diff)
return similarity_score
# Helper funciton to calculate the similarity score (probability aggregation)
def get_similarity_score(before, after):
tbeforeSet = set(before)
tafterSet = set(after)
common = tbeforeSet.intersection(tafterSet) # get the set of common words for two topics
similarity_score = 0 # initialize the similarity score
for word in common:
prob_diff = abs(before[word] - after[word]) # get the difference in probabilities for a common word
similarity_score = similarity_score + (1 - prob_diff) # for each common word, accumulate the similarity score
return similarity_score
# Helper function for wordcloud
def get_wordcloud(topiclist, title, colormap):
fig, ax = plt.subplots(5, 2, figsize=(7,10))
rows=5
row=0
col=0
for i in range(len(topiclist)):
wordcloud = WordCloud(background_color = 'grey',colormap=colormap,prefer_horizontal=1,collocations=False)
wordcloud.generate_from_frequencies(frequencies=topiclist[i])
ax[row][col].imshow(wordcloud)
ax[row][col].grid(False)
ax[row][col].axis("off")
if i < 5:
ax[row][col].set_title('Topic ' + str(i+1))
else:
ax[row][col].set_title('Topic ' + str(i-4))
row=row+1
if row==rows:
row=0
col=col+1
fig.suptitle(title, fontsize=14, position=(0.5,1.05))
fig.tight_layout()
#plt.savefig(f"twitter.png", bbox_inches='tight')
plt.show()
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
# Prepare data for ploting and similarity calculation
twords_before = dict_topics(twitter_before.tokens_cleaned)
twords_after = dict_topics(twitter_after.tokens_cleaned)
rwords_before = dict_topics(reddit_before.tokens_cleaned)
rwords_after = dict_topics(reddit_after.tokens_cleaned)
twitter_dict = twords_before + twords_after
reddit_dict = rwords_before + rwords_after
# Get the similarity score comparison for each platform
num_topics = 5
allList_twitter = []
for i in range(num_topics):
similarityList = []
for j in range(num_topics):
similarityList.append(get_similarity_score(twords_before[i],twords_after[j]))
allList_twitter.append(similarityList)
allList_reddit = []
for i in range(num_topics):
similarityList = []
for j in range(num_topics):
similarityList.append(get_similarity_score(rwords_before[i],rwords_after[j]))
allList_reddit.append(similarityList)
# Save the result in a dataframe
compare_twitter = pd.DataFrame(allList_twitter,columns=['topic1_after','topic2_after','topic3_after','topic4_after','topic5_after'],
index=['topic1_before','topic2_before','topic3_before','topic4_before','topic5_before'])
compare_reddit = pd.DataFrame(allList_reddit,columns=['topic1_after','topic2_after','topic3_after','topic4_after','topic5_after'],
index=['topic1_before','topic2_before','topic3_before','topic4_before','topic5_before'])
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
! pip install wordcloud
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
Collecting wordcloud
Downloading wordcloud-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (366 kB)
|████████████████████████████████| 366 kB 26.0 MB/s
Requirement already satisfied: matplotlib in /shared-libs/python3.7/py/lib/python3.7/site-packages (from wordcloud) (3.3.4)
Requirement already satisfied: pillow in /shared-libs/python3.7/py/lib/python3.7/site-packages (from wordcloud) (8.1.1)
Requirement already satisfied: numpy>=1.6.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from wordcloud) (1.19.5)
Requirement already satisfied: cycler>=0.10 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from matplotlib->wordcloud) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from matplotlib->wordcloud) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from matplotlib->wordcloud) (1.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from matplotlib->wordcloud) (2.4.7)
Requirement already satisfied: six in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from cycler>=0.10->matplotlib->wordcloud) (1.15.0)
Installing collected packages: wordcloud
Successfully installed wordcloud-1.8.1
WARNING: You are using pip version 20.1.1; however, version 21.0.1 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
from wordcloud import WordCloud, STOPWORDS
# Draw wordcloud
get_wordcloud(twitter_dict, 'Before COVID19 --Twittwer-- After COVID19', 'Blues')
# Display simialarity
cm1 = sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True)
s = compare_twitter.style.set_caption("Twitter topic similarity - before & after COVID19")\
.background_gradient(cmap=cm1)
s
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
| topic1_after | topic2_after | topic3_after | topic4_after | topic5_after | |
|---|---|---|---|---|---|
| topic1_before | 0.988972 | 0.000000 | 0.993471 | 0.992332 | 0.997759 |
| topic2_before | 2.984235 | 2.977314 | 1.986472 | 3.968749 | 4.952742 |
| topic3_before | 1.981772 | 0.982283 | 1.976625 | 1.982991 | 1.988612 |
| topic4_before | 0.976047 | 0.973208 | 0.978333 | 1.944922 | 1.973319 |
| topic5_before | 0.991020 | 0.988181 | 0.993306 | 1.992522 | 1.979080 |
Before covid:
After covid:
We also compare the similarity among those topics before and after covid. Topics with higher scores mean that they have overlapping content. The least similar topics are topic 1 before Covid and topic 2 after Covid. They are talking about quite different stuff. Topic 2 after Covid is also least different from all other previous topics, meaning this topic could contain new content around China, 'virus, flu and biden. Further analysis may be conducted to analyze the details of original tweets associated with this topic.
# Prepare corpus - Reddit before Covid
dictionary = corpora.Dictionary(reddit_before.tokens_cleaned)
doc_mtrx = [dictionary.doc2bow(text) for text in reddit_before.tokens_cleaned]
num_topics = 5
LDA_redB = gensim.models.ldamodel.LdaModel(doc_mtrx, num_topics = num_topics, id2word=dictionary,iterations=50, random_state=100)
LDA_redB.save('redditBeforeCovid.model') # save model
# Print top 10 topics
LDA_redB.print_topics()
# Prepare corpus - Twitter after Covid
dictionary = corpora.Dictionary(reddit_after.tokens_cleaned)
doc_mtrx = [dictionary.doc2bow(text) for text in reddit_after.tokens_cleaned]
num_topics = 5
LDA_redA = gensim.models.ldamodel.LdaModel(doc_mtrx, num_topics = num_topics, id2word=dictionary,iterations=50, random_state=100)
LDA_redA.save('redditAfterCovid.model') # save model
# Print top 10 topics
LDA_redA.print_topics()
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
[(0, '0.006*"people" + 0.006*"china" + 0.005*"know" + 0.005*"like" + 0.004*"say" + 0.003*"go" + 0.003*"time" + 0.003*"start" + 0.003*"want" + 0.003*"get"'), (1, '0.006*"like" + 0.005*"say" + 0.005*"time" + 0.005*"trump" + 0.005*"people" + 0.004*"point" + 0.004*"think" + 0.004*"get" + 0.003*"come" + 0.003*"china"'), (2, '0.007*"like" + 0.005*"china" + 0.005*"say" + 0.004*"people" + 0.004*"go" + 0.004*"think" + 0.004*"want" + 0.003*"look" + 0.003*"time" + 0.003*"know"'), (3, '0.008*"china" + 0.006*"like" + 0.006*"time" + 0.005*"know" + 0.004*"go" + 0.004*"people" + 0.004*"look" + 0.004*"world" + 0.003*"want" + 0.003*"say"'), (4, '0.005*"crimson" + 0.005*"like" + 0.005*"white" + 0.004*"people" + 0.004*"sky" + 0.004*"china" + 0.004*"blue" + 0.004*"say" + 0.004*"titanium" + 0.004*"time"')]
# Draw wordcloud
get_wordcloud(reddit_dict, 'Before COVID19 --Reddit-- After COVID19', 'Reds')
# Display simialarity
cm2 = sns.cubehelix_palette(as_cmap=True)
s2 = compare_reddit.style.set_caption("Reddit topic similarity - before & after COVID19")\
.background_gradient(cmap=cm2)
s2
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
| topic1_after | topic2_after | topic3_after | topic4_after | topic5_after | |
|---|---|---|---|---|---|
| topic1_before | 7.991915 | 7.992171 | 6.992553 | 5.994053 | 3.997645 |
| topic2_before | 6.993531 | 6.992987 | 5.993073 | 5.994352 | 3.998465 |
| topic3_before | 5.990957 | 6.992327 | 5.991748 | 5.992882 | 4.993395 |
| topic4_before | 5.996482 | 5.995480 | 5.994292 | 4.996030 | 3.999081 |
| topic5_before | 3.994304 | 3.994272 | 3.993022 | 3.995164 | 3.996308 |
The keywords we chose for data collection are subject to prejudice:
We may misinterpret the sentiment of a post/tweet by removing all stop words (eg. by removing ‘is’, ‘isn’t’ is left out as ‘nt’).
The different text lengths between Twitter and Reddit maked it hard to detect topic patterns equally on two platforms.For reddit data, the long submission may cause the model bias towards a certain topic.
To conduct pairwise comparison, we set the same number of topics for different corpuses, but the number of topics we choose may not yield the best result for each corpus.